This example shows how to perform calculations on dataset arrays.
Import the data from the comma-separated text file
testScores.csv
.
ds = dataset('File','testScores.csv','Delimiter',',')
ds = LastName Sex Test1 Test2 Test3 Test4 'HOWARD' 'male' 90 87 93 92 'WARD' 'male' 87 85 83 90 'TORRES' 'male' 86 85 88 86 'PETERSON' 'female' 75 80 72 77 'GRAY' 'female' 89 86 87 90 'RAMIREZ' 'female' 96 92 98 95 'JAMES' 'male' 78 75 77 77 'WATSON' 'female' 91 94 92 90 'BROOKS' 'female' 86 83 85 89 'KELLY' 'male' 79 76 82 80
There are 4 test scores for each of 10 students, in wide format.
Compute the average (mean) test score for each student in the
dataset array, and store it in a new variable, TestAvg
.
Test scores are in columns 3 to 6.
Use double
to convert the specified dataset
array variables into a numeric array. Then, calculate the mean across
the second dimension (across columns) to get the test average for
each student.
ds.TestAvg = mean(double(ds(:,3:6)),2); ds(:,{'LastName','TestAvg'})
ans = LastName TestAvg 'HOWARD' 90.5 'WARD' 86.25 'TORRES' 86.25 'PETERSON' 76 'GRAY' 88 'RAMIREZ' 95.25 'JAMES' 76.75 'WATSON' 91.75 'BROOKS' 85.75 'KELLY' 79.25
Compute the mean and maximum average test scores for each gender.
stats = grpstats(ds,'Sex',{'mean','max'},'DataVars','TestAvg')
stats = Sex GroupCount mean_TestAvg max_TestAvg male 'male' 5 83.8 90.5 female 'female' 5 87.35 95.25
This returns a new dataset array containing the specified summary
statistics for each level of the grouping variable, Sex
.
The denominator for each test score is 100. Convert the test score denominator to 25.
scores = double(ds(:,3:6)); newScores = scores*25/100; ds = replacedata(ds,newScores,3:6)
ds = LastName Sex Test1 Test2 Test3 Test4 TestAvg 'HOWARD' 'male' 22.5 21.75 23.25 23 90.5 'WARD' 'male' 21.75 21.25 20.75 22.5 86.25 'TORRES' 'male' 21.5 21.25 22 21.5 86.25 'PETERSON' 'female' 18.75 20 18 19.25 76 'GRAY' 'female' 22.25 21.5 21.75 22.5 88 'RAMIREZ' 'female' 24 23 24.5 23.75 95.25 'JAMES' 'male' 19.5 18.75 19.25 19.25 76.75 'WATSON' 'female' 22.75 23.5 23 22.5 91.75 'BROOKS' 'female' 21.5 20.75 21.25 22.25 85.75 'KELLY' 'male' 19.75 19 20.5 20 79.25
The first two lines of code extract the test data and perform
the desired calculation. Then, replacedata
inserts
the new test scores back into the dataset array.
The variable of test score averages, TestAvg
,
is now the final score for each student.
Change the variable name to Final
.
ds.Properties.VarNames{end} = 'Final';
ds
ds = LastName Sex Test1 Test2 Test3 Test4 Final 'HOWARD' 'male' 22.5 21.75 23.25 23 90.5 'WARD' 'male' 21.75 21.25 20.75 22.5 86.25 'TORRES' 'male' 21.5 21.25 22 21.5 86.25 'PETERSON' 'female' 18.75 20 18 19.25 76 'GRAY' 'female' 22.25 21.5 21.75 22.5 88 'RAMIREZ' 'female' 24 23 24.5 23.75 95.25 'JAMES' 'male' 19.5 18.75 19.25 19.25 76.75 'WATSON' 'female' 22.75 23.5 23 22.5 91.75 'BROOKS' 'female' 21.5 20.75 21.25 22.25 85.75 'KELLY' 'male' 19.75 19 20.5 20 79.25
dataset
| double
| grpstats
| replacedata