This example shows how to merge dataset arrays
using join
.
Import the data from the first worksheet in
hospitalSmall.xlsx
into a dataset array, then keep only a
few of the variables.
ds1 = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx')); ds1 = ds1(:,{'id','name','sex','age'})
ds1 = id name sex age 'YPL-320' 'SMITH' 'm' 38 'GLI-532' 'JOHNSON' 'm' 43 'PNI-258' 'WILLIAMS' 'f' 38 'MIJ-579' 'JONES' 'f' 40 'XLK-030' 'BROWN' 'f' 49 'TFP-518' 'DAVIS' 'f' 46 'LPD-746' 'MILLER' 'f' 33 'ATA-945' 'WILSON' 'm' 40 'VNL-702' 'MOORE' 'm' 28 'LQW-768' 'TAYLOR' 'f' 31 'QFY-472' 'ANDERSON' 'f' 45 'UJG-627' 'THOMAS' 'f' 42 'XUE-826' 'JACKSON' 'm' 25 'TRW-072' 'WHITE' 'm' 39
The dataset array, ds1
, has 14 observations (rows) and 4
variables (columns).
Import the data from the worksheet Heights2
in hospitalSmall.xlsx
.
ds2 = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx'),'Sheet','Heights2')
ds2 = id hgt 'LPD-746' 61 'PNI-258' 62 'XUE-826' 71 'ATA-945' 72 'XLK-030' 63
ds2
has height measurements for a subset
of five individuals from the first dataset array, ds1
.
Use join
to merge the two dataset arrays, ds1
and ds2
,
keeping only the subset of observations that are in ds2
.
JoinSmall = join(ds2,ds1)
JoinSmall = id hgt name sex age 'LPD-746' 61 'MILLER' 'f' 33 'PNI-258' 62 'WILLIAMS' 'f' 38 'XUE-826' 71 'JACKSON' 'm' 25 'ATA-945' 72 'WILSON' 'm' 40 'XLK-030' 63 'BROWN' 'f' 49
In JoinSmall
, the variable id
only
appears once. This is because it is the key variable—the variable
that links observations between the two dataset arrays—and
has the same variable name in both ds1
and ds2
.
Merge ds1
and ds2
keeping
all observations in the larger ds1
.
joinAll = join(ds2,ds1,'type','rightouter','mergekeys',true)
joinAll = id hgt name sex age 'ATA-945' 72 'WILSON' 'm' 40 'GLI-532' NaN 'JOHNSON' 'm' 43 'LPD-746' 61 'MILLER' 'f' 33 'LQW-768' NaN 'TAYLOR' 'f' 31 'MIJ-579' NaN 'JONES' 'f' 40 'PNI-258' 62 'WILLIAMS' 'f' 38 'QFY-472' NaN 'ANDERSON' 'f' 45 'TFP-518' NaN 'DAVIS' 'f' 46 'TRW-072' NaN 'WHITE' 'm' 39 'UJG-627' NaN 'THOMAS' 'f' 42 'VNL-702' NaN 'MOORE' 'm' 28 'XLK-030' 63 'BROWN' 'f' 49 'XUE-826' 71 'JACKSON' 'm' 25 'YPL-320' NaN 'SMITH' 'm' 38
ds1
without corresponding height
measurements in ds2
has height value NaN
.
Also, because there is no id
value in ds2
for
each observation in ds1
, you need to merge the
keys using the option 'MergeKeys',true
. This merges
the key variable, id
.When using join
, it is not necessary for
the key variable to have the same name in the dataset arrays to be
merged. Import the data from the worksheet named Heights3
in hospitalSmall.xlsx
.
ds3 = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx'),'Sheet','Heights3')
ds3 = identifier hgt 'GLI-532' 69 'QFY-472' 62 'MIJ-579' 61 'VNL-702' 68 'XLK-030' 63 'LPD-746' 61 'TFP-518' 60 'YPL-320' 71 'ATA-945' 72 'LQW-768' 64 'PNI-258' 62 'UJG-627' 61 'XUE-826' 71 'TRW-072' 69
ds3
has height measurements for each observation
in ds1
. This dataset array has the same patient
identifiers as ds1
, but they are under the variable
name identifier
, instead of id
(and
in a different order).
You can easily change the variable name of the key variable
in ds3
by setting d3.Properties.VarNames
or
using the Variables editor, but it is not required to perform a merge.
Instead, you can specify the name of the key variable in each dataset
array using LeftKeys
and RightKeys
.
joinDiff = join(ds3,ds1,'LeftKeys','identifier','RightKeys','id')
joinDiff = identifier hgt name sex age 'GLI-532' 69 'JOHNSON' 'm' 43 'QFY-472' 62 'ANDERSON' 'f' 45 'MIJ-579' 61 'JONES' 'f' 40 'VNL-702' 68 'MOORE' 'm' 28 'XLK-030' 63 'BROWN' 'f' 49 'LPD-746' 61 'MILLER' 'f' 33 'TFP-518' 60 'DAVIS' 'f' 46 'YPL-320' 71 'SMITH' 'm' 38 'ATA-945' 72 'WILSON' 'm' 40 'LQW-768' 64 'TAYLOR' 'f' 31 'PNI-258' 62 'WILLIAMS' 'f' 38 'UJG-627' 61 'THOMAS' 'f' 42 'XUE-826' 71 'JACKSON' 'm' 25 'TRW-072' 69 'WHITE' 'm' 39
The merged dataset array, joinDiff
, has the
same key variable order and name as the first dataset array input
to join
, ds3
.