Unstack data from single variable into multiple variables
converts the table or timetable, U
= unstack(S
,vars
,ivar
)S
, to an equivalent table or
timetable, U
, that is unstacked. vars
specifies variables in S
, each of which is unstacked into
multiple variables in U
. In general, U
contains more variables, but fewer rows, than S
.
The ivar
input argument specifies the variable in
S
that unstack
uses as an indicator
variable. The values in ivar
determine which variables in
U
contain elements taken from vars
after
unstacking.
The unstack
function treats the remaining variables
differently in tables and timetables.
If S
is a table, then unstack
treats the remaining variables as grouping
variables. Each unique combination of values in the grouping
variables identifies a group of rows in S
that is
unstacked into a single row of U
.
If S
is a timetable, then unstack
discards the remaining variables. However, unstack
treats the vector of row times as a grouping variable.
You cannot unstack the row names of a table, or the row times of a timetable, or specify
either as the indicator variable. You can specify row names or row times as constant
variables with the 'ConstantVariables'
argument.
converts
the table or timetable U
= unstack(S
,vars
,ivar
,Name,Value
)S
with additional options
specified by one or more Name,Value
pair arguments.
For example, you can specify how unstack
converts variables from
S
to variables in U
.
Create a table indicating the amount of snowfall in various towns for various storms. Specify the towns using a categorical
array, since there are a fixed set of town names in this table.
Storm = [3;3;1;3;1;1;4;2;4;2;4;2]; Town = categorical({'Natick';'Worcester';'Natick';'Boston';'Boston';'Worcester';... 'Boston';'Natick';'Worcester';'Worcester';'Natick';'Boston'}); Snowfall = [0;3;5;5;9;10;12;13;15;16;17;21]; S = table(Storm,Town,Snowfall)
S=12×3 table
Storm Town Snowfall
_____ _________ ________
3 Natick 0
3 Worcester 3
1 Natick 5
3 Boston 5
1 Boston 9
1 Worcester 10
4 Boston 12
2 Natick 13
4 Worcester 15
2 Worcester 16
4 Natick 17
2 Boston 21
S
contains three snowfall entries for each storm, one for each town. S
is in stacked format, with Town
having the categorical
data type. Table variables that have the categorical
data type are useful indicator variables and grouping variables for unstacking.
Separate the variable Snowfall
into three variables, one for each town specified in the variable, Town
. The output table, U
, is in unstacked format.
U = unstack(S,'Snowfall','Town')
U=4×4 table
Storm Boston Natick Worcester
_____ ______ ______ _________
3 5 0 3
1 9 5 10
4 12 17 15
2 21 13 16
Each row in U
contains data from rows in S
that have the same value in the grouping variable, Storm
. The order of the unique values in Storm
determines the order of the data in U
.
Unstack data and apply an aggregation function to multiple rows in the same group that have the same values in the indicator variable.
Create a timetable containing data on the price of two stocks over two days. To specify the row times, use datetime
values. Specify the names of the stocks using a categorlcal
array since this timetable has a fixed set of stock names.
Date = [repmat(datetime('2008-04-12'),6,1);... repmat(datetime('2008-04-13'),5,1)]; Stock = categorical({'Stock1';'Stock2';'Stock1';'Stock2';... 'Stock2';'Stock2';'Stock1';'Stock2';... 'Stock2';'Stock1';'Stock2'}); Price = [60.35;27.68;64.19;25.47;28.11;27.98;... 63.85;27.55;26.43;65.73;25.94]; S = timetable(Date,Stock,Price)
S=11×2 timetable
Date Stock Price
___________ ______ _____
12-Apr-2008 Stock1 60.35
12-Apr-2008 Stock2 27.68
12-Apr-2008 Stock1 64.19
12-Apr-2008 Stock2 25.47
12-Apr-2008 Stock2 28.11
12-Apr-2008 Stock2 27.98
13-Apr-2008 Stock1 63.85
13-Apr-2008 Stock2 27.55
13-Apr-2008 Stock2 26.43
13-Apr-2008 Stock1 65.73
13-Apr-2008 Stock2 25.94
S
contains two prices for Stock1
during the first day and four prices for Stock2
during the first day.
Create a timetable containing separate variables for each stock and one row for each day. Use Date
(the vector of row times) as the grouping variable and apply the aggregation function, @mean
, to the numeric values from the variable, Price
, for each group.
[U,is] = unstack(S,'Price','Stock',... 'AggregationFunction',@mean)
U=2×2 timetable
Date Stock1 Stock2
___________ ______ ______
12-Apr-2008 62.27 27.31
13-Apr-2008 64.79 26.64
is = 2×1
1
7
U
contains the average price for each stock grouped by date.
is
identifies the index of the first value for each group of rows in S
. The first value for the group with the date April 13, 2008 is in the seventh row of S
.
S
— Input tableInput table, specified as a table or a timetable. S
must contain data
variables to unstack, vars
, and an indicator variable,
ivar
. The remaining variables in S
can be treated as either grouping variables or constant variables.
vars
— Variables in S
to unstackVariables in S
to unstack, specified as a positive integer, vector of
positive integers, character vector, cell array of character vectors, string
array, or logical vector.
ivar
— Indicator variable in S
Indicator variable in S
, specified as a positive integer, a character
vector, or a string scalar. The values in the variable specified by
ivar
indicate which variables in U
contain elements taken from the variables specified by
vars
.
The variable specified by ivar
can be a numeric vector, logical vector,
character array, cell array of character vectors, string array, or
categorical vector.
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'AggregationFunction',@mean
applies
the aggregation function @mean
to the values in vars
.'GroupingVariables'
— Grouping variables in S
that define groups of rowsGrouping variables in S
that define groups of rows, specified as the
comma-separated pair consisting of
'GroupingVariables'
and a positive integer,
vector of positive integers, character vector, cell array of character
vectors, string array, or logical vector. Each group of rows in
S
becomes one row in U
.
S
can have row labels along its first dimension. If S
is
a table, then it can have row names as the labels. If
S
is a timetable, then it must have row times as
the labels. unstack
can treat row labels as
grouping variables.
If you do not specify 'GroupingVariables'
,
and S
is a timetable, then
unstack
treats the row times as a
grouping variable.
If you specify 'GroupingVariables'
, and
S
has row names or row times, then
unstack
does not treat them as grouping
variables, unless you include them in the value of
'GroupingVariables'
.
'ConstantVariables'
— Variables constant within a groupVariables constant within a group, specified as the comma-separated pair consisting of
'ConstantVariables'
and a positive integer,
vector of positive integers, character vector, cell array of character
vectors, string array, or logical vector.
The values for these variables in U
are taken
from the first row in each group in S
.
You can include the row names or row times of S
when
you specify the value of 'ConstantVariables'
.
'NewDataVariableNames'
— Names for new data variables in U
Names for the new data variables in U
, specified as the comma-separated
pair consisting of 'NewDataVariableNames'
and a cell
array of character vectors or string array.
If you do not specify 'NewDataVariableNames'
, then
unstack
creates names for the new data
variables in U
based on values in the indicator
variable specified by ivar
.
'AggregationFunction'
— Aggregation function from values in vars
to single value@sum
(numeric data) or
@unique
(nonnumeric data) (default) | function handleAggregation function from values in vars
to a single value, specified as
the comma-separated pair consisting of
'AggregationFunction'
and a function handle.
unstack
applies this function to rows from the
same group that have the same value in ivar
. The
function must aggregate the data values into a single value.
If you do not specify the value of
'AggregationFunction'
, then
unstack
uses different default aggregation
functions depending on data type.
For numeric data, the default
aggregation function is sum
.
For nonnumeric data, the
default aggregation function is unique
.
If there are no data values to aggregate, because there are no data
values corresponding to a given indicator value in
ivar
after unstacking, then
unstack
must fill an empty element in the
unstacked output table. In that case, unstack
calls
the aggregation function with an empty array as input. The value that
unstack
fills in depends on what the
aggregation function returns when its input is an empty array.
Value Returned by Aggregation Function When No Data to Aggregate | Value Inserted into Empty Elements of Unstacked Table |
---|---|
Aggregation function raises error |
|
Empty array | Fill value of the appropriate data type
( Example: If the aggregation function
is |
Scalar value | Scalar value returned from aggregation function. Example: If the aggregation function
is |
Vector, matrix, or multidimensional array |
|
'VariableNamingRule'
— Rule for naming variables in U
'modify'
(default) | 'preserve'
Rule for naming variables in U
, specified as the
comma-separated pair consisting of
'VariableNamingRule'
and either the value
'modify'
or 'preserve'
.
The values of 'VariableNamingRule'
specify the
following rules for naming variable in the output table or
timetable.
Value of
| Rule |
---|---|
| Modify names taken from the input table or timetable so that the corresponding variable names in the output are also valid MATLAB® identifiers. |
| Preserve original names taken from the input table or timetable. The corresponding variable names in the output can have any Unicode® characters, including spaces and non-ASCII characters. Note: In some cases,
|
U
— Output tableOutput table, returned as a table or a timetable. U
contains
the unstacked data variables, the grouping variables, and the first
value of each group from any constant variables.
The order of the data in U
is based on the
order of the unique values in the grouping variables.
You can store additional metadata such as descriptions, variable units, variable names, and
row names in U
. For more information, see the Properties
sections of table
or timetable
.
is
— Index to S
Index to S
, returned as a column vector.
For each row in U
, the index vector, is
,
identifies the index of the first value in the corresponding group
of rows in S
.
Grouping variables are utility variables used to group, or categorize, data. Grouping variables are useful for summarizing or visualizing data by group. You can define groups in your table by specifying one or more grouping variables.
A grouping variable can be any of the following:
Categorical vector
Cell array of character vectors
String array
Character array
Numeric vector, typically containing positive integers
Logical vector
Rows that have the same grouping variable value belong to the same group. If you use multiple grouping variables, rows that have the same combination of grouping variable values belong to the same group.
You can specify more than one data variable in S
, and each
variable becomes a set of unstacked data variables in U
. Use
a vector of positive integers, a cell array or string array containing multiple
variable names, or a logical vector to specify vars
. The one
indicator variable, specified by the input argument, ivar
,
applies to all data variables specifies by vars
.
Behavior changed in R2020a
In R2020a, if you do not specify the 'AggregationFunction'
name-value pair argument, then the default aggregation function for nonnumeric data
is the unique
function. In previous releases, there was no
default aggregation function for nonnumeric data, so unstack
would raise an error.
Behavior changed in R2020a
In R2020a, there are behavior changes when the aggregation function has no data to
aggregate. This situation can occur when there are no data values that correspond to
values in the indicator variable after unstacking. In such cases,
unstack
essentially calls the aggregation function on an
empty array.
Value Returned by Aggregation Function When No Data to Aggregate | Behavior in R2020a | Behavior in Previous Releases |
---|---|---|
Data variable is numeric and the aggregation function raises an error. |
|
|
Data variable is nonnumeric and the aggregation function returns an empty array. |
|
|
Data variable is numeric and the aggregation function
returns a scalar value (for example, |
|
|
Data variable is numeric and the aggregation function returns a vector, matrix, or multidimensional array. |
|
|
join
| Join Tables | stack
| Stack Table
Variables | Unstack Table
Variables
You have a modified version of this example. Do you want to open this example with your edits?