Include Derivatives in Problem-Based Workflow

Why Include Derivatives?

The example shows how to include derivative information in nonlinear problem-based optimization. Including gradients or a Hessian in an optimization can give the following benefits:

  • More robust results. Finite differencing steps sometimes reach points where the objective or a nonlinear constraint function is undefined, not finite, or complex.

  • Analytic gradients can be more accurate than finite difference estimates.

  • Including a Hessian can lead to faster convergence, meaning fewer iterations.

  • Analytic gradients can be faster to calculate than finite difference estimates, especially for problems with a sparse structure. For complicated expressions, however, analytic gradients can be slower to calculate.

Despite these advantages, the problem-based approach currently does not use derivative information. To use derivatives in problem-based optimization, convert your problem using prob2struct, and edit the resulting objective and constraint functions.

Create Optimization Problem

Create a problem-based nonlinear optimization. With 2-D control variables x and y, use the objective function

fun1=100(x2x12)2+(1x1)2fun2=exp((xiyi)2)exp(exp(y1))sech(y2)objective = fun1 + fun2.

Include the constraint that the sum of squares of x and y is no more than 4.

fun2 is not based on supported functions for optimization expressions; see Supported Operations on Optimization Variables and Expressions. Therefore, to include fun2 in an optimization problem, you must convert it to an optimization expression using fcn2optimexpr.

prob = optimproblem;
x = optimvar('x',2);
y = optimvar('y',2);
fun1 = 100*(x(2) - x(1)^2)^2 + (1 - x(1))^2;
fun2 = @(x,y)-exp(-sum((x - y).^2))*exp(-exp(-y(1)))*sech(y(2));
prob.Objective = fun1 + fcn2optimexpr(fun2,x,y);
prob.Constraints.cons = sum(x.^2 + y.^2) <= 4;

Convert Problem to Solver-Based Form

To include derivatives, convert the problem to a structure using prob2struct.

problem = prob2struct(prob);

During the conversion, prob2struct creates function files that represent the objective and nonlinear constraint functions. By default, these functions have the names 'generatedObjective.m' and 'generatedConstraints.m'.

Calculate Derivatives and Keep Track of Variables

Calculate the derivatives associated with the objective and nonlinear constraint functions. If you have a Symbolic Math Toolbox™ license, you can use the gradient or jacobian function to help compute the derivatives. See Symbolic Math Toolbox™ Calculates Gradients and Hessians.

The solver-based approach has one control variable. Each optimization variable (x or y, in this example) is a portion of the control variable.

You can find the mapping between optimization variables and the single control variable in the generated objective function file, 'generatedObjective.m'. The beginning of the file contains these lines of code:

%% Variable indices.
idx_x = [1 2];
idx_y = [3 4];

%% Map solver-based variables to problem-based.
x = inputVariables(idx_x);
x = x(:);
y = inputVariables(idx_y);
y = y(:);

In this code, the control variable has the name inputVariables.

Alternatively, you can find the mapping by using varindex.

idx = varindex(prob);
disp(idx.x)
     1     2
disp(idx.y)
     3     4

Once you know the mapping, you can use standard calculus to find the following expressions for the gradient grad of the objective function objective = fun1 + fun2 with respect to the control variable [x(:);y(:)].

grad=[2x1400x1(x2x12)+σ12200x2200x12+σ2σ1dexp(y1)σ2+dtanh(y2)],

where

c=exp((x1y1)2(x2y2)2)d=cexp(exp(y1))sech(y2)σ1=2(x1y1)dσ2=2(x2y2)d.

Edit the Objective and Constraint Files

To include the calculated gradients in the objective function file, edit 'generatedObjective.m' as follows.

%% Insert gradient calculation here.
% If you include a gradient, notify the solver by setting the
% SpecifyObjectiveGradient option to true.
if nargout > 1
    c = exp(-sum((x - y).^2));
    d = c*exp(-exp(-y(1)))*sech(y(2));
    sigma1 = 2*(x(1) - y(1))*d;
    sigma2 = 2*(x(2) - y(2))*d;
    grad = zeros(4,1);
    grad(1) = 2*x(1) - 400*x(1)*(x(2) - x(1)^2) + sigma1 - 2;
    grad(2) = 200*x(2) - 200*x(1)^2 + sigma2;
    grad(3) = -sigma1 - d*exp(-y(1));
    grad(4) = -sigma2 + d*tanh(y(2));
end

Recall that the nonlinear constraint is x(1)^2 + x(2)^2 + y(1)^2 + y(2)^2 <= 4. Clearly, the gradient of this constraint function is 2*[x;y]. To include the calculated gradients of the nonlinear constraint, edit 'generatedConstraints.m' as follows.

%% Insert gradient calculation here.
% If you include a gradient, notify the solver by setting the
% SpecifyConstraintGradient option to true.
if nargout > 2
    cineqGrad = 2*[x;y];
    ceqGrad = [];
end

Run Problem Using Two Methods

Run the problem using both the problem-based and solver-based methods to see the differences. To run the problem using derivative information, create appropriate options in the problem structure.

options = optimoptions('fmincon','SpecifyObjectiveGradient',true,...
    'SpecifyConstraintGradient',true);
problem.options = options;

Nonlinear problems require a nonempty x0 field in the problem structure.

x0 = [-1;2;1;-1];
problem.x0 = x0;

Call fmincon on the problem structure.

[xsolver,fvalsolver,exitflagsolver,outputsolver] = fmincon(problem)
Local minimum found that satisfies the constraints.

Optimization completed because the objective function is non-decreasing in 
feasible directions, to within the value of the optimality tolerance,
and constraints are satisfied to within the value of the constraint tolerance.

<stopping criteria details>

xsolver =

    0.8671
    0.7505
    1.0433
    0.5140


fvalsolver =

   -0.5500


exitflagsolver =

     1


outputsolver = 

  struct with fields:

         iterations: 46
          funcCount: 77
    constrviolation: 0
           stepsize: 7.4091e-06
          algorithm: 'interior-point'
      firstorderopt: 7.5203e-07
       cgiterations: 9
            message: '↵Local minimum found that satisfies the constraints.↵↵Optimization completed because the objective function is non-decreasing in ↵feasible directions, to within the value of the optimality tolerance,↵and constraints are satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization completed: The relative first-order optimality measure, 7.520304e-07,↵is less than options.OptimalityTolerance = 1.000000e-06, and the relative maximum constraint↵violation, 0.000000e+00, is less than options.ConstraintTolerance = 1.000000e-06.↵↵'

Compare this solution with the one obtained from solve, which uses no derivative information.

init.x = x0(1:2);
init.y = x0(3:4);
[xproblem,fvalproblem,exitflagproblem,outputproblem] = solve(prob,init)
Local minimum found that satisfies the constraints.

Optimization completed because the objective function is non-decreasing in 
feasible directions, to within the value of the optimality tolerance,
and constraints are satisfied to within the value of the constraint tolerance.

<stopping criteria details>

xproblem = 

  struct with fields:

    x: [2×1 double]
    y: [2×1 double]


fvalproblem =

   -0.5500


exitflagproblem = 

    OptimalSolution


outputproblem = 

  struct with fields:

         iterations: 48
          funcCount: 276
    constrviolation: 0
           stepsize: 7.9340e-07
          algorithm: 'interior-point'
      firstorderopt: 6.5496e-07
       cgiterations: 9
            message: '↵Local minimum found that satisfies the constraints.↵↵Optimization completed because the objective function is non-decreasing in ↵feasible directions, to within the value of the optimality tolerance,↵and constraints are satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization completed: The relative first-order optimality measure, 6.549635e-07,↵is less than options.OptimalityTolerance = 1.000000e-06, and the relative maximum constraint↵violation, 0.000000e+00, is less than options.ConstraintTolerance = 1.000000e-06.↵↵'
             solver: 'fmincon'

Both solutions report the same function value to display precision, and both require roughly the same number of iterations (46 using gradient information, 48 without). However, the solution using gradients requires only 77 function evaluations, compared to 276 for the solution without gradients.

See Also

| |

Related Topics