Running Bayesian optimization in parallel can save time. Running in parallel
requires Parallel Computing Toolbox™. bayesopt
performs parallel objective function
evaluations concurrently on parallel workers.
To optimize in parallel:
bayesopt
— Set the
UseParallel
name-value pair to
true
. For example,
results = bayesopt(fun,vars,'UseParallel',true);
Fit functions — Set the UseParallel
field of the
HyperparameterOptimizationOptions
structure to
true
. For example,
Mdl = fitcsvm(X,Y,'OptimizeHyperparameters','auto',... 'HyperparameterOptimizationOptions',struct('UseParallel',true))
The parallel Bayesian optimization algorithm is similar to the serial algorithm, which is described in Bayesian Optimization Algorithm. The differences are:
bayesopt
assigns points to evaluate to the parallel
workers, generally one point at a time. bayesopt
calculates on the client to determine which point to assign.
After bayesopt
evaluates the initial random points,
it chooses points to evaluate by fitting a Gaussian process (GP) model. To
fit a GP model while some workers are still evaluating points,
bayesopt
imputes a value to each point that is
still on a worker. The imputed value is the mean of the GP model value at
the points it is evaluating, or some other value as specified by the
bayesopt
'ParallelMethod'
name-value pair. For parallel optimization
of fit functions, bayesopt
uses the default
ParallelMethod
imputed value.
After bayesopt
assigns a point to evaluate, and
before it computes a new point to assign, it checks whether too many workers
are idle. The threshold for active workers is determined by the
MinWorkerUtilization
name-value pair. If too many
workers are idle, then bayesopt
assigns random points,
chosen uniformly within bounds, to all idle workers. This step causes the
workers to be active more quickly, but the workers have random points rather
than fitted points. If the number of idle workers does not exceed the
threshold, then bayesopt
chooses a point to evaluate as
usual, by fitting a GP model and maximizing the acquisition function.
Note
Due to the nonreproducibility of parallel timing, parallel Bayesian optimization does not necessarily yield reproducible results.
Fit functions have no special settings for better parallel performance. In
contrast, several bayesopt
settings can help to speed an
optimization.
Setting the GPActiveSetSize
option to a smaller value than
the default (300
) can speed the solution. The cost is
potential inaccuracy in the points that bayesopt
chooses to
evaluate, because the GP model of the objective function can be less accurate
than with a larger value. Setting the option to a larger value can result in a
more accurate GP model, but requires more time to create the model.
Setting the ParallelMethod
option to
'max-observed'
can lead bayesopt
to
search more widely for a global optimum. This choice can lead to a better
solution in less time. However, the default value of
'clipped-model-prediction'
is often best.
Setting the MinWorkerUtilization
option to a large value
can result in higher parallel utilization. However, this setting causes more
completely random points to be evaluated, which can lead to less accurate
solutions. A large value, in this context, depends on how many workers you have.
The default is floor(0.8*N)
, where N
is
the number of parallel workers. Setting the option to a lower value can give
lower parallel utilization, but with the benefit of higher quality
points.
You can place an objective function on the parallel workers in one of three ways. Some have better performance, but require a more complex setup.
1. Automatic If you give a function handle as
the objective function, bayesopt
sends the handle to all
the parallel workers at the beginning of its run. For example,
load ionosphere splits = optimizableVariable('splits',[1,100],'Type','integer'); minleaf = optimizableVariable('minleaf',[1,100],'Type','integer'); fun = @(params)kfoldLoss(fitctree(X,Y,'Kfold',5,... 'MaxNumSplits',params.splits,'MinLeaf',params.minleaf)); results = bayesopt(fun,[splits,minleaf],'UseParallel',true);
This method is effective if the handle is small, or if you run the optimization only once. However, if you plan to run the optimization several times, you can save time by using one of the other two techniques.
2. Parallel constant If you plan to run an
optimization several times, save time by transferring the objective function to
the workers only once. This technique is especially effective when the function
handle incorporates a large amount of data. Transfer the objective once by
setting the function handle to a parallel.pool.Constant
(Parallel Computing Toolbox)
construct, as in this example.
load ionosphere splits = optimizableVariable('splits',[1,100],'Type','integer'); minleaf = optimizableVariable('minleaf',[1,100],'Type','integer'); fun = @(params)kfoldLoss(fitctree(X,Y,'Kfold',5,... 'MaxNumSplits',params.splits,'MinLeaf',params.minleaf)); C = copyFunctionHandleToWorkers(fun); results1 = bayesopt(C,[splits,minleaf],'UseParallel',true); results2 = bayesopt(C,[splits,minleaf],'UseParallel',true,... 'MaxObjectiveEvaluations',50); results3 = bayesopt(C,[splits,minleaf],'UseParallel',true,... 'AcquisitionFunction','expected-improvement');
In this example, copyFunctionHandleToWorkers
sends the
function handle to the workers only once.
3. Create objective function on workers If
you have a great deal of data to send to the workers, you can avoid loading the
data in the client by using spmd
(Parallel Computing Toolbox) to load the data on the
workers. Use a Composite
(Parallel Computing Toolbox) with
parallel.pool.Constant
to access the distributed
objective functions.
% makeFun is at the end of this script spmd fun = makeFun(); end % ObjectiveFunction is now a Composite. Get a parallel.pool.Constant % that refers to it, without copying it to the client: C = parallel.pool.Constant(fun); % You could also use the line % C = parallel.pool.Constant(@MakeFun); % In this case, you do not use spmd % Call bayesopt, passing the Constant splits = optimizableVariable('splits', [1 100]); minleaf = optimizableVariable('minleaf', [1 100]); bo = bayesopt(C,[splits minleaf],'UseParallel',true); function f = makeFun() load('ionosphere','X','Y'); f = @fun; function L = fun(Params) L = kfoldLoss(fitctree(X,Y, ... 'KFold', 5,... 'MaxNumSplits',Params.splits, ... 'MinLeaf', Params.minleaf)); end end
In this example, the function handle exists only on the workers. The handle never appears on the client.
When bayesopt
runs in parallel, the Bayesian optimization
output includes these differences.
Iterative Display — Iterative
display includes a column showing the number of active workers. This is the
number after bayesopt
assigns a job to the next
worker.
Plot Functions
Objective Function Model plot
(@plotObjectiveModel
) shows the pending
points (those points executing on parallel workers). The height
of the points depends on the ParallelMethod
name-value pair.
Elapsed Time plot (@plotElapsedTime
) shows
the total elapsed time with the label Real
time and the total objective function evaluation
time, summed over all workers, with the label
Objective evaluation time (all
workers). Objective evaluation time includes the time
to start a worker on a job.
parallel.pool.Constant
(Parallel Computing Toolbox) | spmd
(Parallel Computing Toolbox)