Nested optimization problems

In this tutorial we explain how to write a nested optimization problem, where an outer problem uses the results from the optimization of an inner subproblem.

To model the problem, we define a user-defined function to handle the decomposition of the inner problem inside the outer one. Finally, we show how to improve the performance by using a cache that avoids resolving the inner problem.

For a simpler example of writing a user-defined function, see the User-defined Hessians tutorial.

This tutorial uses the following packages:

using JuMP
import Ipopt

In the rest of this tutorial, our goal is to solve the bilevel optimization problem:

\[\begin{array}{r l} \min\limits_{x,z} & x_1^2 + x_2^2 + z \\ s.t. & \begin{array}{r l} z = \max\limits_{y} & x_1^2 y_1 + x_2^2 y_2 - x_1 y_1^4 - 2 x_2 y_2^4 \\ s.t. & (y_1 - 10)^2 + (y_2 - 10)^2 \le 25 \end{array} \\ & x \ge 0. \end{array}\]

This bilevel optimization problem is composed of two nested optimization problems. An upper level, involving variables $x$, and a lower level, involving variables $y$. From the perspective of the lower-level problem, the values of $x$ are fixed parameters, and so the model optimizes $y$ given those fixed parameters. Simultaneously, the upper-level problem optimizes $x$ and $z$ given the response of $y$.

Decomposition

There are a few ways to solve this problem, but we are going to use a nonlinear decomposition method. The first step is to write a function to compute the lower-level problem:

\[\begin{array}{r l} V(x_1, x_2) = \max\limits_{y} & x_1^2 y_1 + x_2^2 y_2 - x_1 y_1^4 - 2 x_2 y_2^4 \\ s.t. & (y_1 - 10)^2 + (y_2 - 10)^2 \le 25 \end{array}\]

function solve_lower_level(x...)
    model = Model(Ipopt.Optimizer)
    set_silent(model)
    @variable(model, y[1:2])
    @NLobjective(
        model,
        Max,
        x[1]^2 * y[1] + x[2]^2 * y[2] - x[1] * y[1]^4 - 2 * x[2] * y[2]^4,
    )
    @constraint(model, (y[1] - 10)^2 + (y[2] - 10)^2 <= 25)
    optimize!(model)
    @assert termination_status(model) == LOCALLY_SOLVED
    return objective_value(model), value.(y)
end
solve_lower_level (generic function with 1 method)

The next function takes a value of $x$ and returns the optimal lower-level objective-value and the optimal response $y$. The reason why we need both the objective and the optimal $y$ will be made clear shortly, but for now let us define:

function V(x...)
    f, _ = solve_lower_level(x...)
    return f
end
V (generic function with 1 method)

Then, we can substitute $V$ into our full problem to create:

\[\begin{array}{r l} \min\limits_{x} & x_1^2 + x_2^2 + V(x_1, x_2) \\ s.t. & x \ge 0. \end{array}\]

This looks like a nonlinear optimization problem with a user-defined function $V$! However, because $V$ solves an optimization problem internally, we can't use automatic differentiation to compute the first and second derivatives. Instead, we can use JuMP's ability to pass callback functions for the gradient and Hessian instead.

First up, we need to define the gradient of $V$ with respect to $x$. In general, this may be difficult to compute, but because $x$ appears only in the objective, we can just differentiate the objective function with respect to $x$, giving:

function ∇V(g::AbstractVector, x...)
    _, y = solve_lower_level(x...)
    g[1] = 2 * x[1] * y[1] - y[1]^4
    g[2] = 2 * x[2] * y[2] - 2 * y[2]^4
    return
end
∇V (generic function with 1 method)

Second, we need to define the Hessian of $V$ with respect to $x$. This is a symmetric matrix, but in our example only the diagonal elements are non-zero:

function ∇²V(H::AbstractMatrix, x...)
    _, y = solve_lower_level(x...)
    H[1, 1] = 2 * y[1]
    H[2, 2] = 2 * y[2]
    return
end
∇²V (generic function with 1 method)
Info

Providing an explicit Hessian function is optional if first derivatives are already available.

We now have enough to define our bilevel optimization problem:

model = Model(Ipopt.Optimizer)
@variable(model, x[1:2] >= 0)
register(model, :V, 2, V, ∇V, ∇²V)
@NLobjective(model, Min, x[1]^2 + x[2]^2 + V(x[1], x[2]))
optimize!(model)
solution_summary(model)
* Solver : Ipopt

* Status
  Result count       : 1
  Termination status : LOCALLY_SOLVED
  Message from the solver:
  "Solve_Succeeded"

* Candidate solution (result #1)
  Primal status      : FEASIBLE_POINT
  Dual status        : FEASIBLE_POINT
  Objective value    : -4.18983e+05
  Dual objective value : 0.00000e+00

* Work counters
  Solve time (sec)   : 9.50017e-01

The optimal objective value is:

objective_value(model)
-418983.4868064086

and the optimal upper-level decision variables $x$ are:

value.(x)
2-element Vector{Float64}:
 154.97862349841566
 180.00961418355053

To compute the optimal lower-level decision variables, we need to call solve_lower_level with the optimal upper-level decision variables:

_, y = solve_lower_level(value.(x)...)
y
2-element Vector{Float64}:
 7.072593960195712
 5.946569893523139

Improving performance

Our solution approach works, but it has a performance problem: every time we need to compute the value, gradient, or Hessian of $V$, we have to re-solve the lower-level optimization problem. This is wasteful, because we will often call the gradient and Hessian at the same point, and so solving the problem twice with the same input repeats work unnecessarily.

We can work around this by using a cache:

mutable struct Cache
    x::Any
    f::Float64
    y::Vector{Float64}
end

with a function to update the cache if needed:

function _update_if_needed(cache::Cache, x...)
    if cache.x !== x
        cache.f, cache.y = solve_lower_level(x...)
        cache.x = x
    end
    return
end
_update_if_needed (generic function with 1 method)

Then, we define cached versions of out three functions which call _updated_if_needed and return values from the cache.

function cached_f(cache::Cache, x...)
    _update_if_needed(cache, x...)
    return cache.f
end

function cached_∇f(cache::Cache, g::AbstractVector, x...)
    _update_if_needed(cache, x...)
    g[1] = 2 * x[1] * cache.y[1] - cache.y[1]^4
    g[2] = 2 * x[2] * cache.y[2] - 2 * cache.y[2]^4
    return
end

function cached_∇²f(cache::Cache, H::AbstractMatrix, x...)
    _update_if_needed(cache, x...)
    H[1, 1] = 2 * cache.y[1]
    H[2, 2] = 2 * cache.y[2]
    return
end
cached_∇²f (generic function with 1 method)

Now we're ready to setup and solve the upper level optimization problem:

model = Model(Ipopt.Optimizer)
@variable(model, x[1:2] >= 0)
cache = Cache(Float64[], NaN, Float64[])
register(
    model,
    :V,
    2,
    (x...) -> cached_f(cache, x...),
    (g, x...) -> cached_∇f(cache, g, x...),
    (H, x...) -> cached_∇²f(cache, H, x...),
)
@NLobjective(model, Min, x[1]^2 + x[2]^2 + V(x[1], x[2]))
optimize!(model)
solution_summary(model)
* Solver : Ipopt

* Status
  Result count       : 1
  Termination status : LOCALLY_SOLVED
  Message from the solver:
  "Solve_Succeeded"

* Candidate solution (result #1)
  Primal status      : FEASIBLE_POINT
  Dual status        : FEASIBLE_POINT
  Objective value    : -4.18983e+05
  Dual objective value : 0.00000e+00

* Work counters
  Solve time (sec)   : 3.54468e-01

an we can check we get the same objective value:

objective_value(model)
-418983.4868064086

and upper-level decision variable $x$:

value.(x)
2-element Vector{Float64}:
 154.97862349841566
 180.00961418355053

Tip

This tutorial was generated using Literate.jl. View the source .jl file on GitHub.