grouptransform

Transform by group

Description

example

G = grouptransform(T,groupvars,method) transforms the data in a table or timetable using the computation in method, grouping by the variables specified in groupvars. The output G is a table or timetable containing the transformed data in place of the nongrouping variables from T. For example, G = grouptransform(T,'Gender','norm') normalizes the data in T by gender using the 2-norm.

example

G = grouptransform(T,groupvars,groupbins,method) bins the transformed data according to groupbins, placing the groups at the end of the output table as additional variables. For example, G = grouptransform(T,'SaleDate','year','rescale') scales the data in T to the range [0,1] and bins by sale year.

G = grouptransform(___,datavars) specifies the table variables to transform for either of the previous syntaxes.

example

G = grouptransform(___,Name,Value) specifies additional grouping properties using one or more name-value pairs. For example, G = grouptransform(T,'Temp','linearfill','ReplaceValues',false) appends the filled data as an additional variable of T instead of replacing the nongrouping variables.

B = grouptransform(A,groupvars,method) returns the transformed data according to the unique combinations of grouping vectors in groupvars when A is a vector or matrix. groupvars can be a column vector, matrix, or cell array of column vectors.

example

B = grouptransform(A,groupvars,groupbins,method) bins the transformed data according to groupbins.

B = grouptransform(___,Name,Value) specifies additional grouping properties using one or more name-value pairs for either of the previous array syntaxes.

example

[B,BG] = grouptransform(A,___) also returns the rows of the grouping vectors.

Examples

collapse all

Create a timetable containing a progress status for 3 teams.

timeStamp = days([1 1 1 2 2 2 3 3 3]');
teamNumber = [1 2 3 1 2 3 1 2 3]';
percentComplete = [14.2 28.1 11.5 NaN NaN 19.3 46.1 51.2 30.3]';
T = timetable(timeStamp,teamNumber,percentComplete)
T=9×2 timetable
    timeStamp    teamNumber    percentComplete
    _________    __________    _______________

    1 day            1              14.2      
    1 day            2              28.1      
    1 day            3              11.5      
    2 days           1               NaN      
    2 days           2               NaN      
    2 days           3              19.3      
    3 days           1              46.1      
    3 days           2              51.2      
    3 days           3              30.3      

Fill missing status percentages (NaN) for each group using linear interpolation.

G = grouptransform(T,'teamNumber','linearfill','percentComplete')
G=9×2 timetable
    timeStamp    teamNumber    percentComplete
    _________    __________    _______________

    1 day            1               14.2     
    1 day            2               28.1     
    1 day            3               11.5     
    2 days           1              30.15     
    2 days           2              39.65     
    2 days           3               19.3     
    3 days           1               46.1     
    3 days           2               51.2     
    3 days           3               30.3     

To append the filled data to the original table instead of replacing the percentComplete variable, use the 'ReplaceValues' parameter.

Gappend = grouptransform(T,'teamNumber','linearfill','percentComplete','ReplaceValues',false)
Gappend=9×3 timetable
    timeStamp    teamNumber    percentComplete    linearfill_percentComplete
    _________    __________    _______________    __________________________

    1 day            1              14.2                     14.2           
    1 day            2              28.1                     28.1           
    1 day            3              11.5                     11.5           
    2 days           1               NaN                    30.15           
    2 days           2               NaN                    39.65           
    2 days           3              19.3                     19.3           
    3 days           1              46.1                     46.1           
    3 days           2              51.2                     51.2           
    3 days           3              30.3                     30.3           

Create a table of dates and corresponding profits.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10;...
                       2017 3 14; 2017 3 31; 2017 3 25;...
                       2017 3 29; 2017 3 21; 2017 3 18]);
profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';
T = table(timeStamps,profit)
T=10×2 table
    timeStamps     profit
    ___________    ______

    04-Mar-2017     2032 
    02-Mar-2017     3071 
    15-Mar-2017     1185 
    10-Mar-2017     2587 
    14-Mar-2017     1998 
    31-Mar-2017     2899 
    25-Mar-2017     3112 
    29-Mar-2017      909 
    21-Mar-2017     2619 
    18-Mar-2017     3085 

Grouping by day name, normalize the profits by the 2-norm.

G = grouptransform(T,'timeStamps','dayname','norm')
G=10×3 table
    timeStamps     profit     dayname_timeStamps
    ___________    _______    __________________

    04-Mar-2017    0.42069        Saturday      
    02-Mar-2017          1        Thursday      
    15-Mar-2017    0.79344        Wednesday     
    10-Mar-2017    0.66582        Friday        
    14-Mar-2017    0.60654        Tuesday       
    31-Mar-2017    0.74612        Friday        
    25-Mar-2017    0.64428        Saturday      
    29-Mar-2017    0.60864        Wednesday     
    21-Mar-2017    0.79506        Tuesday       
    18-Mar-2017    0.63869        Saturday      

Create a vector of dates and a vector of corresponding profit values.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ...
                       2017 3 14; 2017 3 31; 2017 3 25; ...
                       2017 3 29; 2017 3 21; 2017 3 18]);
profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';

Grouping by day of the week, normalize the profit values by the 2-norm. Display the transformed data and which group it corresponds to.

[normDailyProfit,dayOfWeek] = grouptransform(profit,timeStamps,'dayname','norm')
normDailyProfit = 10×1

    0.4207
    1.0000
    0.7934
    0.6658
    0.6065
    0.7461
    0.6443
    0.6086
    0.7951
    0.6387

dayOfWeek = 10x1 categorical
     Saturday 
     Thursday 
     Wednesday 
     Friday 
     Tuesday 
     Friday 
     Saturday 
     Wednesday 
     Tuesday 
     Saturday 

Input Arguments

collapse all

Input data, specified as a table or timetable.

Input array, specified as a vector or matrix.

Grouping variables or vectors, specified as a scalar, vector, matrix, cell array, function handle, or table vartype subscript.

For table or timetable input data, groupvars indicates which variables to use to compute the groups, and can be one of the following:

  • A character vector or string scalar specifying a single table variable name

  • A cell array of character vectors or a string array, where each element is a table variable name

  • A vector of table variable indices

  • A logical vector whose elements each correspond to a table variable, where true includes the corresponding variable and false excludes it

  • A function handle that takes a table variable as input and returns a logical scalar

  • A table vartype subscript

For array input, groupvars can be either a column vector with the same number of rows as A or a group of column vectors arranged in a matrix or cell array.

Example: 'Age'

Example: {'Height','Weight'}

Example: vartype('numeric')

Computation method, specified as one of the following:

Method

Description

'zscore'

Normalize data to have mean 0 and standard deviation 1

'norm'

Normalize data by 2-norm

'meancenter'

Normalize data to have mean 0

'rescale'

Rescale range to [0,1]

'meanfill'

Fill missing values with the mean of the group data

'linearfill'

Fill missing values by linear interpolation of nonmissing group data

You can also specify a function handle that returns one entity whose first dimension has length 1 or has the same number of rows as the input data. If the function returns an entity with first dimension length equal to 1, then grouptransform repeats that value so that the output has the same number of rows as the input.

Data variables for table or timetable input, specified as a scalar, vector, cell array, function handle, or table vartype subscript. datavars indicates which variables of the input table to apply the methods to, and can be one of the following options:

  • A character vector or string scalar specifying a single table variable name

  • A cell array of character vectors or string array where each element is a table variable name

  • A vector of table variable indices

  • A logical vector whose elements each correspond to a table variable, where true includes the corresponding variable and false excludes it

  • A function handle that takes a table variable as input and returns a logical scalar

  • A table vartype subscript

When datavars is not specified, grouptransform applies the computation to each nongrouping variable.

Example: 'Profit'

Example: {'Income','Expenses'}

Example: @isnumeric

Example: vartype('numeric')

Binning scheme, specified as one of the following options:

  • 'none', indicating no binning

  • A list of bin edges, specified as a numeric vector, or a datetime vector for datetime grouping variables

  • A number of bins, specified as an integer scalar

  • A time duration, specified as a scalar of type duration or calendarDuration indicating bin widths (for datetime or duration grouping variables only)

  • A cell array listing binning rules for each grouping variable

  • A time bin for datetime and duration grouping variables only, specified as one of the following character vectors:

    ValueDescriptionData Type
    'second'

    Each bin is 1 second.

    datetime and duration
    'minute'

    Each bin is 1 minute.

    datetime and duration
    'hour'

    Each bin is 1 hour.

    datetime and duration
    'day'

    Each bin is 1 calendar day. This value accounts for Daylight Saving Time shifts.

    datetime and duration
    'week'Each bin is 1 calendar week.datetime only
    'month'Each bin is 1 calendar month.datetime only
    'quarter'Each bin is 1 calendar quarter.datetime only
    'year'

    Each bin is 1 calendar year. This value accounts for leap days.

    datetime and duration
    'decade'Each bin is 1 decade (10 calendar years).datetime only
    'century'Each bin is 1 century (100 calendar years).datetime only
    'secondofminute'

    Bins are seconds from 0 to 59.

    datetime only
    'minuteofhour'

    Bins are minutes from 0 to 59.

    datetime only
    'hourofday'

    Bins are hours from 0 to 23.

    datetime only
    'dayofweek'

    Bins are days from 1 to 7. The first day of the week is Sunday.

    datetime only
    'dayname'Bins are full day names such as 'Sunday'.datetime only
    'dayofmonth'Bins are days from 1 to 31.datetime only
    'dayofyear'Bins are days from 1 to 366.datetime only
    'weekofmonth'Bins are weeks from 1 to 6.datetime only
    'weekofyear'Bins are weeks from 1 to 54.datetime only
    'monthname'Bins are full month names such as 'January'.datetime only
    'monthofyear'

    Bins are months from 1 to 12.

    datetime only
    'quarterofyear'Bins are quarters from 1 to 4.datetime only

When multiple grouping variables are specified, you can provide a single binning rule that is applied to all grouping variables, or a cell array containing a binning method for each grouping variable such as {'none',[0 2 4 Inf]}.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: G = grouptransform(T,groupvars,groupbins,'zscore','IncludedEdge','right')

Included bin edge, specified as either 'left' or 'right', indicating which end of the bin interval is inclusive.

This name-value pair can only be specified when groupbins is specified, and the value is applied to all binning schemes for all grouping variables or vectors.

Transform placement indicator, specified as a numeric or logical 1 (true) or 0 (false). When the parameter value is true, grouptransform outputs a table or array with the transformed data in place of the nongrouping variables or vectors from the input. When the parameter value is false, grouptransform appends the transformed data as additional matrix columns or table variables to the input data.

Output Arguments

collapse all

Output table, returned as a table containing the transformed data for each group.

Output array, returned as a vector or matrix containing the transformed data.

Grouping vectors for array input data, returned as a column vector or cell array of column vectors.

Tips

  • When making many calls to grouptransform, consider converting grouping variables to type categorical or logical when possible for improved performance. For example, if you have a grouping variable of type char (such as Gender with elements 'Male' and 'Female'), you can convert it to a categorical value using the command categorical(Gender).

Extended Capabilities

Introduced in R2018b