This example shows how to reformat dataset
arrays using stack
and unstack
.
Import the data from the comma-separated text file
testScores.csv
.
ds = dataset('File','testScores.csv','Delimiter',',')
ds = LastName Sex Test1 Test2 Test3 Test4 'HOWARD' 'male' 90 87 93 92 'WARD' 'male' 87 85 83 90 'TORRES' 'male' 86 85 88 86 'PETERSON' 'female' 75 80 72 77 'GRAY' 'female' 89 86 87 90 'RAMIREZ' 'female' 96 92 98 95 'JAMES' 'male' 78 75 77 77 'WATSON' 'female' 91 94 92 90 'BROOKS' 'female' 86 83 85 89 'KELLY' 'male' 79 76 82 80
Each of the 10 students has 4 test scores.
With the data in this format, you can, for example, calculate the average test score for each student. The test scores are in columns 3 to 6.
ds.TestAve = mean(double(ds(:,3:6)),2); ds(:,{'LastName','Sex','TestAve'})
ans = LastName Sex TestAve 'HOWARD' 'male' 90.5 'WARD' 'male' 86.25 'TORRES' 'male' 86.25 'PETERSON' 'female' 76 'GRAY' 'female' 88 'RAMIREZ' 'female' 95.25 'JAMES' 'male' 76.75 'WATSON' 'female' 91.75 'BROOKS' 'female' 85.75 'KELLY' 'male' 79.25
A new variable with average test scores is added to the dataset
array, ds
.
Stack the test score variables into a new variable, Scores
.
dsNew = stack(ds,{'Test1','Test2','Test3','Test4'},... 'newDataVarName','Scores')
dsNew = LastName Sex TestAve Scores_Indicator Scores 'HOWARD' 'male' 90.5 Test1 90 'HOWARD' 'male' 90.5 Test2 87 'HOWARD' 'male' 90.5 Test3 93 'HOWARD' 'male' 90.5 Test4 92 'WARD' 'male' 86.25 Test1 87 'WARD' 'male' 86.25 Test2 85 'WARD' 'male' 86.25 Test3 83 'WARD' 'male' 86.25 Test4 90 'TORRES' 'male' 86.25 Test1 86 'TORRES' 'male' 86.25 Test2 85 'TORRES' 'male' 86.25 Test3 88 'TORRES' 'male' 86.25 Test4 86 'PETERSON' 'female' 76 Test1 75 'PETERSON' 'female' 76 Test2 80 'PETERSON' 'female' 76 Test3 72 'PETERSON' 'female' 76 Test4 77 'GRAY' 'female' 88 Test1 89 'GRAY' 'female' 88 Test2 86 'GRAY' 'female' 88 Test3 87 'GRAY' 'female' 88 Test4 90 'RAMIREZ' 'female' 95.25 Test1 96 'RAMIREZ' 'female' 95.25 Test2 92 'RAMIREZ' 'female' 95.25 Test3 98 'RAMIREZ' 'female' 95.25 Test4 95 'JAMES' 'male' 76.75 Test1 78 'JAMES' 'male' 76.75 Test2 75 'JAMES' 'male' 76.75 Test3 77 'JAMES' 'male' 76.75 Test4 77 'WATSON' 'female' 91.75 Test1 91 'WATSON' 'female' 91.75 Test2 94 'WATSON' 'female' 91.75 Test3 92 'WATSON' 'female' 91.75 Test4 90 'BROOKS' 'female' 85.75 Test1 86 'BROOKS' 'female' 85.75 Test2 83 'BROOKS' 'female' 85.75 Test3 85 'BROOKS' 'female' 85.75 Test4 89 'KELLY' 'male' 79.25 Test1 79 'KELLY' 'male' 79.25 Test2 76 'KELLY' 'male' 79.25 Test3 82 'KELLY' 'male' 79.25 Test4 80
The original test variable names, Test1
, Test2
, Test3
,
and Test4
, appear as levels in the combined test
scores indicator variable, Scores_Indicator
.
With the data in this format, you can use Scores_Indicator
as
a grouping variable, and draw box plots of test scores grouped by
test.
figure() boxplot(dsNew.Scores,dsNew.Scores_Indicator)
Reformat dsNew
back into its original format.
dsOrig = unstack(dsNew,'Scores','Scores_Indicator'); dsOrig(:,{'LastName','Test1','Test2','Test3','Test4'})
ans = LastName Test1 Test2 Test3 Test4 'HOWARD' 90 87 93 92 'WARD' 87 85 83 90 'TORRES' 86 85 88 86 'PETERSON' 75 80 72 77 'GRAY' 89 86 87 90 'RAMIREZ' 96 92 98 95 'JAMES' 78 75 77 77 'WATSON' 91 94 92 90 'BROOKS' 86 83 85 89 'KELLY' 79 76 82 80
The dataset array is back in wide format. unstack
reassigns
the levels of the indicator variable, Scores_Indicator
,
as variable names in the unstacked dataset array.
dataset
| double
| stack
| unstack