This example shows how to optimize an SVM classification using the fitcsvm
function and OptimizeHyperparameters
name-value pair. The classification works on locations of points from a Gaussian mixture model. In The Elements of Statistical Learning, Hastie, Tibshirani, and Friedman (2009), page 17 describes the model. The model begins with generating 10 base points for a "green" class, distributed as 2-D independent normals with mean (1,0) and unit variance. It also generates 10 base points for a "red" class, distributed as 2-D independent normals with mean (0,1) and unit variance. For each class (green and red), generate 100 random points as follows:
Choose a base point m of the appropriate color uniformly at random.
Generate an independent random point with 2-D normal distribution with mean m and variance I/5, where I is the 2-by-2 identity matrix. In this example, use a variance I/50 to show the advantage of optimization more clearly.
Generate the 10 base points for each class.
rng default % For reproducibility grnpop = mvnrnd([1,0],eye(2),10); redpop = mvnrnd([0,1],eye(2),10);
View the base points.
plot(grnpop(:,1),grnpop(:,2),'go') hold on plot(redpop(:,1),redpop(:,2),'ro') hold off
Since some red base points are close to green base points, it can be difficult to classify the data points based on location alone.
Generate the 100 data points of each class.
redpts = zeros(100,2);grnpts = redpts; for i = 1:100 grnpts(i,:) = mvnrnd(grnpop(randi(10),:),eye(2)*0.02); redpts(i,:) = mvnrnd(redpop(randi(10),:),eye(2)*0.02); end
View the data points.
figure plot(grnpts(:,1),grnpts(:,2),'go') hold on plot(redpts(:,1),redpts(:,2),'ro') hold off
Put the data into one matrix, and make a vector grp
that labels the class of each point.
cdata = [grnpts;redpts];
grp = ones(200,1);
% Green label 1, red label -1
grp(101:200) = -1;
Set up a partition for cross-validation. This step fixes the train and test sets that the optimization uses at each step.
c = cvpartition(200,'KFold',10);
To find a good fit, meaning one with a low cross-validation loss, set options to use Bayesian optimization. Use the same cross-validation partition c
in all optimizations.
For reproducibility, use the 'expected-improvement-plus'
acquisition function.
opts = struct('Optimizer','bayesopt','ShowPlots',true,'CVPartition',c,... 'AcquisitionFunctionName','expected-improvement-plus'); svmmod = fitcsvm(cdata,grp,'KernelFunction','rbf',... 'OptimizeHyperparameters','auto','HyperparameterOptimizationOptions',opts)
|=====================================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | BoxConstraint| KernelScale | | | result | | runtime | (observed) | (estim.) | | | |=====================================================================================================| | 1 | Best | 0.345 | 0.26549 | 0.345 | 0.345 | 0.00474 | 306.44 |
| 2 | Best | 0.115 | 0.15695 | 0.115 | 0.12678 | 430.31 | 1.4864 |
| 3 | Accept | 0.52 | 0.13092 | 0.115 | 0.1152 | 0.028415 | 0.014369 |
| 4 | Accept | 0.61 | 0.14335 | 0.115 | 0.11504 | 133.94 | 0.0031427 |
| 5 | Accept | 0.34 | 0.17594 | 0.115 | 0.11504 | 0.010993 | 5.7742 |
| 6 | Best | 0.085 | 0.14682 | 0.085 | 0.085039 | 885.63 | 0.68403 |
| 7 | Accept | 0.105 | 0.13317 | 0.085 | 0.085428 | 0.3057 | 0.58118 |
| 8 | Accept | 0.21 | 0.14519 | 0.085 | 0.09566 | 0.16044 | 0.91824 |
| 9 | Accept | 0.085 | 0.19161 | 0.085 | 0.08725 | 972.19 | 0.46259 |
| 10 | Accept | 0.1 | 0.17136 | 0.085 | 0.090952 | 990.29 | 0.491 |
| 11 | Best | 0.08 | 0.14907 | 0.08 | 0.079362 | 2.5195 | 0.291 |
| 12 | Accept | 0.09 | 0.1253 | 0.08 | 0.08402 | 14.338 | 0.44386 |
| 13 | Accept | 0.1 | 0.15751 | 0.08 | 0.08508 | 0.0022577 | 0.23803 |
| 14 | Accept | 0.11 | 0.1306 | 0.08 | 0.087378 | 0.2115 | 0.32109 |
| 15 | Best | 0.07 | 0.15539 | 0.07 | 0.081507 | 910.2 | 0.25218 |
| 16 | Best | 0.065 | 0.13597 | 0.065 | 0.072457 | 953.22 | 0.26253 |
| 17 | Accept | 0.075 | 0.14395 | 0.065 | 0.072554 | 998.74 | 0.23087 |
| 18 | Accept | 0.295 | 0.15592 | 0.065 | 0.072647 | 996.18 | 44.626 |
| 19 | Accept | 0.07 | 0.18332 | 0.065 | 0.06946 | 985.37 | 0.27389 |
| 20 | Accept | 0.165 | 0.15914 | 0.065 | 0.071622 | 0.065103 | 0.13679 |
|=====================================================================================================| | Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | BoxConstraint| KernelScale | | | result | | runtime | (observed) | (estim.) | | | |=====================================================================================================| | 21 | Accept | 0.345 | 0.16119 | 0.065 | 0.071764 | 971.7 | 999.01 |
| 22 | Accept | 0.61 | 0.15335 | 0.065 | 0.071967 | 0.0010168 | 0.0010005 |
| 23 | Accept | 0.345 | 0.12802 | 0.065 | 0.071959 | 0.0010674 | 999.18 |
| 24 | Accept | 0.35 | 0.11611 | 0.065 | 0.071863 | 0.0010003 | 40.628 |
| 25 | Accept | 0.24 | 0.19154 | 0.065 | 0.072124 | 996.55 | 10.423 |
| 26 | Accept | 0.61 | 0.18784 | 0.065 | 0.072068 | 958.64 | 0.0010026 |
| 27 | Accept | 0.47 | 0.1372 | 0.065 | 0.07218 | 993.69 | 0.029723 |
| 28 | Accept | 0.3 | 0.12571 | 0.065 | 0.072291 | 993.15 | 170.01 |
| 29 | Accept | 0.16 | 0.28753 | 0.065 | 0.072104 | 992.81 | 3.8594 |
| 30 | Accept | 0.365 | 0.1363 | 0.065 | 0.072112 | 0.0010017 | 0.044287 |
__________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 49.5691 seconds Total objective function evaluation time: 4.7818 Best observed feasible point: BoxConstraint KernelScale _____________ ___________ 953.22 0.26253 Observed objective function value = 0.065 Estimated objective function value = 0.073726 Function evaluation time = 0.13597 Best estimated feasible point (according to models): BoxConstraint KernelScale _____________ ___________ 985.37 0.27389 Estimated objective function value = 0.072112 Estimated function evaluation time = 0.16104
svmmod = ClassificationSVM ResponseName: 'Y' CategoricalPredictors: [] ClassNames: [-1 1] ScoreTransform: 'none' NumObservations: 200 HyperparameterOptimizationResults: [1x1 BayesianOptimization] Alpha: [77x1 double] Bias: -0.2352 KernelParameters: [1x1 struct] BoxConstraints: [200x1 double] ConvergenceInfo: [1x1 struct] IsSupportVector: [200x1 logical] Solver: 'SMO' Properties, Methods
Find the loss of the optimized model.
lossnew = kfoldLoss(fitcsvm(cdata,grp,'CVPartition',c,'KernelFunction','rbf',... 'BoxConstraint',svmmod.HyperparameterOptimizationResults.XAtMinObjective.BoxConstraint,... 'KernelScale',svmmod.HyperparameterOptimizationResults.XAtMinObjective.KernelScale))
lossnew = 0.0650
This loss is the same as the loss reported in the optimization output under "Observed objective function value".
Visualize the optimized classifier.
d = 0.02; [x1Grid,x2Grid] = meshgrid(min(cdata(:,1)):d:max(cdata(:,1)),... min(cdata(:,2)):d:max(cdata(:,2))); xGrid = [x1Grid(:),x2Grid(:)]; [~,scores] = predict(svmmod,xGrid); figure; h = nan(3,1); % Preallocation h(1:2) = gscatter(cdata(:,1),cdata(:,2),grp,'rg','+*'); hold on h(3) = plot(cdata(svmmod.IsSupportVector,1),... cdata(svmmod.IsSupportVector,2),'ko'); contour(x1Grid,x2Grid,reshape(scores(:,2),size(x1Grid)),[0 0],'k'); legend(h,{'-1','+1','Support Vectors'},'Location','Southeast'); axis equal hold off