This example shows how to include derivative information in nonlinear problem-based optimization when automatic derivatives do not apply, or when you want to include a Hessian, which is not provided using automatic differentiation. Including gradients or a Hessian in an optimization can give the following benefits:
More robust results. Finite differencing steps sometimes reach points where the objective or a nonlinear constraint function is undefined, not finite, or complex.
Increased accuracy. Analytic gradients can be more accurate than finite difference estimates.
Faster convergence. Including a Hessian can lead to faster convergence, meaning fewer iterations.
Improved performance. Analytic gradients can be faster to calculate than finite difference estimates, especially for problems with a sparse structure. For complicated expressions, however, analytic gradients can be slower to calculate.
Starting in R2020b, the solve
function
can use automatic differentiation for supported functions in order to supply gradients to
solvers. These automatic derivatives apply only to gradients (first derivatives), not
Hessians (second derivatives).
Automatic differentiation applies when you do not use fcn2optimexpr
to
create an objective or constraint function. If you need to use
fcn2optimexpr
, this example shows how to include derivative
information.
The way to use derivatives in problem-based optimization without automatic
differentiation is to convert your problem using prob2struct
, and
then edit the resulting objective and constraint functions. This example shows a hybrid
approach where automatic differentiation supplies derivatives for part of the objective
function.
With 2-D control variables x
and y
, use the
objective function
Include the constraint that the sum of squares of x
and
y
is no more than 4.
fun2
is not based on supported functions for optimization
expressions; see Supported Operations on Optimization Variables and Expressions. Therefore, to include
fun2
in an optimization problem, you must convert it to an optimization
expression using fcn2optimexpr
. (fun2
is not
supported because it uses the unsupported sech
function. However,
sech(x) = 1/cosh(x)
, and cosh
is a supported
function. To use 1/cosh
and automatic differentiation, see Effect of Automatic Differentiation in Problem-Based Optimization.)
To use AD on the supported functions, set up the problem without the unsupported
function fun2
, and include fun2
later.
prob = optimproblem; x = optimvar('x',2); y = optimvar('y',2); fun1 = 100*(x(2) - x(1)^2)^2 + (1 - x(1))^2; fun2 = @(x,y)exp(-sum((x - y).^2))*exp(-exp(-y(1)))*sech(y(2)); prob.Objective = fun1; prob.Constraints.cons = sum(x.^2 + y.^2) <= 4;
To include derivatives of fun2
, first convert the problem without
fun2
to a structure using prob2struct
.
problem = prob2struct(prob,... 'ObjectiveFunctionName','generatedObjectiveBefore');
During the conversion, prob2struct
creates function files that
represent the objective and nonlinear constraint functions. By default, these functions have
the names 'generatedObjective.m'
and
'generatedConstraints.m'
. The objective function file without
fun2
is 'generatedObjectiveBefore.m'
.
The generated objective and constraint functions include gradients.
Calculate the derivatives associated with fun2
. If you have a
Symbolic Math Toolbox™ license, you can use the gradient
(Symbolic Math Toolbox) or jacobian
(Symbolic Math Toolbox) function to help compute the
derivatives. See Calculate Gradients and Hessians Using Symbolic Math Toolbox™.
The solver-based approach has one control variable. Each optimization variable
(x
and y
, in this example) is a portion of the
control variable. You can find the mapping between optimization variables and the single
control variable in the generated objective function file
'generatedObjectiveBefore.m'
. The beginning of the file contains these
lines of code or similar lines.
%% Variable indices. xidx = 1:2; yidx = 3:4; %% Map solver-based variables to problem-based. x = inputVariables(xidx); x = x(:); y = inputVariables(yidx); y = y(:);
In this code, the control variable has the name
inputVariables
.
Alternatively, you can find the mapping by using varindex
.
idx = varindex(prob); disp(idx.x)
1 2
disp(idx.y)
3 4
The full objective function includes the negative of fun2
:
fun2 = @(x,y)exp(-sum((x - y).^2))*exp(-exp(-y(1)))*sech(y(2));
Using standard calculus, calculate gradfun2
, the gradient of
fun2
.
where
Edit 'generatedObjectiveBefore.m'
to include
fun2
.
%% Compute objective function. arg2 = x(1); arg4 = (x(2) - arg2.^2); arg5 = 100; arg6 = arg4.^2; arg8 = (1 - x(1)); obj = ((arg5 .* arg6) + arg8.^2); fun2 = exp(-sum((x - y).^2))*exp(-exp(-y(1)))*sech(y(2)); obj = obj - fun2;
Include the calculated gradients in the objective function file by editing
'generatedObjectiveBefore.m'
. If you have a software version that does
not perform the gradient calculation, include all of these lines. If your software performs
the gradient calculation, include only the bold lines, which calculate the gradient of
fun2
.
%% Compute objective gradient. if nargout > 1 arg9 = 1; arg10 = (-(arg9.*2.*(arg8(:)))); arg11 = zeros([2,size(arg10,2)]); arg11(1,:) = arg10; arg12 = ((arg9.*arg5(:)).*2.*(arg4(:))); arg13 = zeros([2,size(arg12,2)]); arg13(2,:) = arg12; arg14 = ((-((arg9.*arg5(:)).*2.*(arg4(:)))).*2.*(arg2(:))); arg15 = zeros([2,size(arg14,2)]); arg15(1,:) = arg14; arg16 = zeros([4, 1]); arg16(xidx,:) = arg11 + arg13 + arg15; grad = arg16(:); % Calculate gradient of fun2 d = fun2; sigma1 = -2*(x(1) - y(1))*d; sigma2 = -2*(x(2) - y(2))*d; gradfun2 = zeros(4, 1); gradfun2(1) = sigma1; gradfun2(2) = sigma2; gradfun2(3) = -sigma1 + d*exp(-y(1)); gradfun2(4) = -sigma2 - d*tanh(y(2)); % Subtract the gradient of fun2 grad = grad - gradfun2; end
Recall that the nonlinear constraint is x(1)^2 + x(2)^2 + y(1)^2 + y(2)^2 <=
4
. The gradient of this constraint function is 2*[x;y]
. If
your software calculates the constraint gradient and includes it in the generated constraint
file, then you do not need to do anything more. If your software does not calculate the
constraint gradient, then include the gradient of the nonlinear constraint by editing
'generatedConstraints.m'
to include these lines.
%% Insert gradient calculation here. % If you include a gradient, notify the solver by setting the % SpecifyConstraintGradient option to true.
if nargout > 2 cineqGrad = 2*[x;y]; ceqGrad = []; end
Run the problem using both the solver-based and problem-based (no gradient) methods to see the differences. To run the solver-based problem using derivative information, create appropriate options in the problem structure.
options = optimoptions('fmincon','SpecifyObjectiveGradient',true,... 'SpecifyConstraintGradient',true); problem.options = options;
Nonlinear problems require a nonempty x0
field in the problem
structure.
x0 = [-1;2;1;-1]; problem.x0 = x0;
Call fmincon
on the problem structure.
[xsolver,fvalsolver,exitflagsolver,outputsolver] = fmincon(problem)
Local minimum found that satisfies the constraints. Optimization completed because the objective function is non-decreasing in feasible directions, to within the value of the optimality tolerance, and constraints are satisfied to within the value of the constraint tolerance. <stopping criteria details> xsolver = 0.8671 0.7505 1.0433 0.5140 fvalsolver = -0.5500 exitflagsolver = 1 outputsolver = struct with fields: iterations: 46 funcCount: 77 constrviolation: 0 stepsize: 7.4091e-06 algorithm: 'interior-point' firstorderopt: 7.5203e-07 cgiterations: 9 message: '↵Local minimum found that satisfies the constraints.↵↵Optimization completed because the objective function is non-decreasing in ↵feasible directions, to within the value of the optimality tolerance,↵and constraints are satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization completed: The relative first-order optimality measure, 7.520304e-07,↵is less than options.OptimalityTolerance = 1.000000e-06, and the relative maximum constraint↵violation, 0.000000e+00, is less than options.ConstraintTolerance = 1.000000e-06.↵↵'
Compare this solution with the one obtained from solve
without
derivative information.
init.x = x0(1:2); init.y = x0(3:4); prob.Objective = prob.Objective - fcn2optimexpr(fun2,x,y); [xproblem,fvalproblem,exitflagproblem,outputproblem] = solve(prob,init,... "ConstraintDerivative","finite-differences",... "ObjectiveDerivative","finite-differences")
Solving problem using fmincon. Local minimum found that satisfies the constraints. Optimization completed because the objective function is non-decreasing in feasible directions, to within the value of the optimality tolerance, and constraints are satisfied to within the value of the constraint tolerance. <stopping criteria details> xproblem = struct with fields: x: [2×1 double] y: [2×1 double] fvalproblem = -0.5500 exitflagproblem = OptimalSolution outputproblem = struct with fields: iterations: 47 funcCount: 269 constrviolation: 0 stepsize: 8.9896e-08 algorithm: 'interior-point' firstorderopt: 1.3377e-08 cgiterations: 9 message: '↵Local minimum found that satisfies the constraints.↵↵Optimization completed because the objective function is non-decreasing in ↵feasible directions, to within the value of the optimality tolerance,↵and constraints are satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization completed: The relative first-order optimality measure, 1.337675e-08,↵is less than options.OptimalityTolerance = 1.000000e-06, and the relative maximum constraint↵violation, 0.000000e+00, is less than options.ConstraintTolerance = 1.000000e-06.↵↵' bestfeasible: [1×1 struct] solver: 'fmincon'
Both solutions report the same function value to display precision, and both require roughly the same number of iterations (46 using gradient information, 47 without). However, the solution with gradients requires only 77 function evaluations, compared to 269 for the solution without gradients.
To include a Hessian, you must use prob2struct
, even if all your
functions are supported for optimization expressions. This example shows how to use a
Hessian for the fmincon
interior-point
algorithm. The fminunc
trust-region
algorithm and the fmincon
trust-region-reflective
algorithm use a different syntax; see Hessian for fminunc trust-region or fmincon trust-region-reflective algorithms.
As described in Hessian for fmincon interior-point algorithm, the Hessian is the Hessian of the Lagrangian.
(1) |
Include this Hessian by creating a function file to compute the Hessian, and creating a
HessianFcn
option for fmincon
to use the Hessian.
To create the Hessian in this case, create the second derivatives of the objective and
nonlinear constraints separately.
The second derivative matrix of the objective function for this example is complicated.
Its objective function listing hessianfun(x)
was created by Symbolic Math Toolbox using the same approach as described in Calculate Gradients and Hessians Using Symbolic Math Toolbox™.
function hessfun = hessianfun(in1) %HESSIANFUN % HESSFUN = HESSIANFUN(IN1) % This function was generated by Symbolic Math Toolbox version 8.5. % 15-Mar-2020 10:12:23 x1 = in1(1,:); x2 = in1(2,:); y1 = in1(3,:); y2 = in1(4,:); t2 = cosh(y2); t3 = sinh(y2); t4 = x1.*2.0; t5 = x2.*2.0; t6 = y1.*2.0; t7 = y2.*2.0; t8 = -y1; t10 = -y2; t15 = x1.*4.0e+2; t9 = -t6; t11 = -t7; t12 = 1.0./t2; t14 = exp(t8); t16 = t8+x1; t17 = t10+x2; t18 = -t15; t13 = t12.^2; t19 = t4+t9; t20 = t5+t11; t21 = t16.^2; t22 = t17.^2; t23 = -t14; t24 = exp(t23); t25 = t19.^2; t26 = t20.^2; t27 = -t21; t28 = -t22; t29 = t27+t28; t30 = exp(t29); t31 = t12.*t24.*t30.*2.0; t33 = t3.*t13.*t14.*t24.*t30; t34 = t3.*t13.*t19.*t24.*t30; t35 = t3.*t13.*t20.*t24.*t30; t36 = t12.*t24.*t25.*t30; t37 = t12.*t24.*t26.*t30; t42 = t12.*t14.*t19.*t24.*t30; t43 = t12.*t14.*t20.*t24.*t30; t44 = t12.*t20.*t23.*t24.*t30; t45 = t12.*t19.*t20.*t24.*t30; t32 = -t31; t38 = -t34; t39 = -t35; t40 = -t36; t41 = -t37; t46 = -t45; t49 = t43+t45; t47 = t18+t46; t48 = t38+t45; t50 = t32+t37+t39; t51 = t32+t36+t42; t52 = t33+t34+t44+t46; hessfun = reshape([t31+t40-x2.*4.0e+2+x1.^2.*1.2e+3+2.0,t47,... t51,t48,t47,t31+t41+2.0e+2,t49,t50,t51,... t49,t31+t40-t42.*2.0+t12.*t14.*t24.*t30-t12.*t24.*t30.*exp(t9),... t52,t48,t50,t52,t35.*2.0+t41+t12.*t24.*t30.*3.0-t3.^2.*t12.^3.*t24.*t30.*2.0],... [4,4]);
In contrast, the Hessian of the nonlinear inequality constraint is simple; it is twice the identity matrix.
hessianc = 2*eye(4);
Create the Hessian of the Lagrangian as a function handle.
H = @(x,lam)(hessianfun(x) + hessianc*lam.ineqnonlin);
Create options to use this Hessian.
problem.options.HessianFcn = H;
Solve the problem and display the number of iterations and number of function evaluations. The solution is approximately the same as before.
[xhess,fvalhess,exitflaghess,outputhess] = fmincon(problem); disp(outputhess.iterations) disp(outputhess.funcCount)
Local minimum found that satisfies the constraints. Optimization completed because the objective function is non-decreasing in feasible directions, to within the value of the optimality tolerance, and constraints are satisfied to within the value of the constraint tolerance. 31 59
This time, fmincon
takes only 31 iterations compared to over 45,
and only 59 function evaluations compared to over 75. In summary, providing an analytic
Hessian calculation can improve the efficiency of the solution process, but developing a
function to calculate the Hessian can be difficult.
fcn2optimexpr
| prob2struct
| varindex