User-defined Hessians

In this tutorial, we explain how to write a user-defined function (see User-defined Functions) with a Hessian matrix explicitly provided by the user.

This tutorial uses the following packages:

using JuMP
import Ipopt

Rosenbrock example

As a simple example, we first consider the Rosenbrock function:

rosenbrock(x...) = (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2
rosenbrock (generic function with 1 method)

which has the gradient vector:

function ∇rosenbrock(g::AbstractVector, x...)
    g[1] = 400 * x[1]^3 - 400 * x[1] * x[2] + 2 * x[1] - 2
    g[2] = 200 * (x[2] - x[1]^2)
    return
end
∇rosenbrock (generic function with 1 method)

and the Hessian matrix:

function ∇²rosenbrock(H::AbstractMatrix, x...)
    H[1, 1] = 1200 * x[1]^2 - 400 * x[2] + 2
    # H[1, 2] = -400 * x[1] <-- not needed because Hessian is symmetric
    H[2, 1] = -400 * x[1]
    H[2, 2] = 200.0
    return
end
∇²rosenbrock (generic function with 1 method)

You may assume the Hessian matrix H is initialized with zeros, and because it is symmetric you need only to fill in the non-zero of the lower-triangular terms.

The matrix type passed in as H depends on the automatic differentiation system, so make sure the first argument to the Hessian function supports an AbstractMatrix (it may be something other than Matrix{Float64}). However, you may assume only that H supports size(H) and setindex!.

Now that we have the function, its gradient, and its Hessian, we can construct a JuMP model, register the function, and use it in a @NL macro:

model = Model(Ipopt.Optimizer)
@variable(model, x[1:2])
register(model, :rosenbrock, 2, rosenbrock, ∇rosenbrock, ∇²rosenbrock)
@NLobjective(model, Min, rosenbrock(x[1], x[2]))
optimize!(model)
solution_summary(model; verbose = true)
* Solver : Ipopt

* Status
  Termination status : LOCALLY_SOLVED
  Primal status      : FEASIBLE_POINT
  Dual status        : FEASIBLE_POINT
  Result count       : 1
  Has duals          : true
  Message from the solver:
  "Solve_Succeeded"

* Candidate solution
  Objective value      : 1.20425e-27
  Dual objective value : 0.00000e+00
  Primal solution :
    x[1] : 1.00000e+00
    x[2] : 1.00000e+00
  Dual solution :

* Work counters
  Solve time (sec)   : 5.19359e-02

Bilevel optimization

User-defined Hessian functions can be useful when solving more complicated problems. In the rest of this tutorial, our goal is to solve the bilevel optimization problem:

\[\begin{array}{r l} \min\limits_{x,z} & x_1^2 + x_2^2 + z \\ s.t. & \begin{array}{r l} z \ge \max\limits_{y} & x_1^2 y_1 + x_2^2 y_2 - x_1 y_1^4 - 2 x_2 y_2^4 \\ s.t. & (y_1 - 10)^2 + (y_2 - 10)^2 \le 25 \end{array} \\ & x \ge 0. \end{array}\]

This bilevel optimization problem is composed of two nested optimization problems. An upper level, involving variables $x$, and a lower level, involving variables $y$. From the perspective of the lower-level problem, the values of $x$ are fixed parameters, and so the model optimizes $y$ given those fixed parameters. Simultaneously, the upper-level problem optimizes $x$ and $z$ given the response of $y$.

Decomposition

There are a few ways to solve this problem, but we are going to use a nonlinear decomposition method. The first step is to write a function to compute the lower-level problem:

\[\begin{array}{r l} V(x_1, x_2) = \max\limits_{y} & x_1^2 y_1 + x_2^2 y_2 - x_1 y_1^4 - 2 x_2 y_2^4 \\ s.t. & (y_1 - 10)^2 + (y_2 - 10)^2 \le 25 \end{array}\]

function solve_lower_level(x...)
    model = Model(Ipopt.Optimizer)
    set_silent(model)
    @variable(model, y[1:2])
    @NLobjective(
        model,
        Max,
        x[1]^2 * y[1] + x[2]^2 * y[2] - x[1] * y[1]^4 - 2 * x[2] * y[2]^4,
    )
    @constraint(model, (y[1] - 10)^2 + (y[2] - 10)^2 <= 25)
    optimize!(model)
    @assert termination_status(model) == LOCALLY_SOLVED
    return objective_value(model), value.(y)
end
solve_lower_level (generic function with 1 method)

The next function takes a value of $x$ and returns the optimal lower-level objective-value and the optimal response $y$. The reason why we need both the objective and the optimal $y$ will be made clear shortly, but for now let us define:

function V(x...)
    f, _ = solve_lower_level(x...)
    return f
end
V (generic function with 1 method)

Then, we can substitute $V$ into our full problem to create:

\[\begin{array}{r l} \min\limits_{x} & x_1^2 + x_2^2 + V(x_1, x_2) \\ s.t. & x \ge 0. \end{array}\]

This looks like a nonlinear optimization problem with a user-defined function $V$! However, because $V$ solves an optimization problem internally, we can't use automatic differentiation to compute the first and second derivatives. Instead, we can use JuMP's ability to pass callback functions for the gradient and Hessian instead.

First up, we need to define the gradient of $V$ with respect to $x$. In general, this may be difficult to compute, but because $x$ appears only in the objective, we can just differentiate the objective function with respect to $x$, giving:

function ∇V(g::AbstractVector, x...)
    _, y = solve_lower_level(x...)
    g[1] = 2 * x[1] * y[1] - y[1]^4
    g[2] = 2 * x[2] * y[2] - 2 * y[2]^4
    return
end
∇V (generic function with 1 method)

Second, we need to define the Hessian of $V$ with respect to $x$. This is a symmetric matrix, but in our example only the diagonal elements are non-zero:

function ∇²V(H::AbstractMatrix, x...)
    _, y = solve_lower_level(x...)
    H[1, 1] = 2 * y[1]
    H[2, 2] = 2 * y[2]
    return
end
∇²V (generic function with 1 method)

We now have enough to define our bilevel optimization problem:

model = Model(Ipopt.Optimizer)
@variable(model, x[1:2] >= 0)
register(model, :V, 2, V, ∇V, ∇²V)
@NLobjective(model, Min, x[1]^2 + x[2]^2 + V(x[1], x[2]))
optimize!(model)
solution_summary(model)
* Solver : Ipopt

* Status
  Termination status : LOCALLY_SOLVED
  Primal status      : FEASIBLE_POINT
  Dual status        : FEASIBLE_POINT
  Message from the solver:
  "Solve_Succeeded"

* Candidate solution
  Objective value      : -4.18983e+05
  Dual objective value : 0.00000e+00

* Work counters
  Solve time (sec)   : 1.03544e+00

The optimal objective value is:

objective_value(model)
-418983.4868064086

and the optimal upper-level decision variables $x$ are:

value.(x)
2-element Vector{Float64}:
 154.97862349841566
 180.00961418355053

To compute the optimal lower-level decision variables, we need to call solve_lower_level with the optimal upper-level decision variables:

_, y = solve_lower_level(value.(x)...)
y
2-element Vector{Float64}:
 7.072593960195712
 5.946569893523139

Improving performance

Our solution approach works, but it has a performance problem: every time we need to compute the value, gradient, or Hessian of $V$, we have to re-solve the lower-level optimization problem! This is wasteful, because we will often call the gradient and Hessian at the same point, and so solving the problem twice with the same input repeats work unnecessarily.

We can work around this by using a cache:

mutable struct Cache
    x::Any
    f::Float64
    y::Vector{Float64}
end

with a function to update the cache if needed:

function _update_if_needed(cache::Cache, x...)
    if cache.x !== x
        cache.f, cache.y = solve_lower_level(x...)
        cache.x = x
    end
    return
end
_update_if_needed (generic function with 1 method)

Then, we define cached versions of out three functions which call _updated_if_needed and return values from the cache.

function cached_f(cache::Cache, x...)
    _update_if_needed(cache, x...)
    return cache.f
end

function cached_∇f(cache::Cache, g::AbstractVector, x...)
    _update_if_needed(cache, x...)
    g[1] = 2 * x[1] * cache.y[1] - cache.y[1]^4
    g[2] = 2 * x[2] * cache.y[2] - 2 * cache.y[2]^4
    return
end

function cached_∇²f(cache::Cache, H::AbstractMatrix, x...)
    _update_if_needed(cache, x...)
    H[1, 1] = 2 * cache.y[1]
    H[2, 2] = 2 * cache.y[2]
    return
end
cached_∇²f (generic function with 1 method)

Now we're ready to setup and solve the upper level optimization problem:

model = Model(Ipopt.Optimizer)
@variable(model, x[1:2] >= 0)
cache = Cache(Float64[], NaN, Float64[])
register(
    model,
    :V,
    2,
    (x...) -> cached_f(cache, x...),
    (g, x...) -> cached_∇f(cache, g, x...),
    (H, x...) -> cached_∇²f(cache, H, x...),
)
@NLobjective(model, Min, x[1]^2 + x[2]^2 + V(x[1], x[2]))
optimize!(model)
solution_summary(model)
* Solver : Ipopt

* Status
  Termination status : LOCALLY_SOLVED
  Primal status      : FEASIBLE_POINT
  Dual status        : FEASIBLE_POINT
  Message from the solver:
  "Solve_Succeeded"

* Candidate solution
  Objective value      : -4.18983e+05
  Dual objective value : 0.00000e+00

* Work counters
  Solve time (sec)   : 6.14560e-01

an we can check we get the same objective value:

objective_value(model)
-418983.4868064086

and upper-level decision variable $x$:

value.(x)
2-element Vector{Float64}:
 154.97862349841566
 180.00961418355053

Tip

This tutorial was generated using Literate.jl. View the source .jl file on GitHub.