Nested optimization problems

This tutorial was generated using Literate.jl. Download the source as a .jl file.

The purpose of this tutorial is to show how to solve a nested optimization problem, where an upper problem uses the results from the optimization of a lower subproblem.

To model the problem, we define a user-defined operator to handle the decomposition of the lower problem inside the upper one. Finally, we show how to improve the performance by using a cache that avoids resolving the lower problem.

For a simpler example of writing a user-defined operator, see the User-defined Hessians tutorial.

Info

The JuMP extension BilevelJuMP.jl can also be used to model and solve bilevel optimization problems.

Required packages

This tutorial uses the following packages:

using JuMP
import Ipopt

Formulation

In the rest of this tutorial, our goal is to solve the bilevel optimization problem:

\[\begin{array}{r l} \min\limits_{x,z} & x_1^2 + x_2^2 + z \\ s.t. & \begin{array}{r l} z = \max\limits_{y} & x_1^2 y_1 + x_2^2 y_2 - x_1 y_1^4 - 2 x_2 y_2^4 \\ s.t. & (y_1 - 10)^2 + (y_2 - 10)^2 \le 25 \end{array} \\ & x \ge 0. \end{array}\]

This bilevel optimization problem is composed of two nested optimization problems. An upper level, involving variables $x$, and a lower level, involving variables $y$. From the perspective of the lower-level problem, the values of $x$ are fixed parameters, and so the model optimizes $y$ given those fixed parameters. Simultaneously, the upper-level problem optimizes $x$ and $z$ given the response of $y$.

Decomposition

There are a few ways to solve this problem, but we are going to use a nonlinear decomposition method. The first step is to write a function to compute the lower-level problem:

\[\begin{array}{r l} V(x_1, x_2) = \max\limits_{y} & x_1^2 y_1 + x_2^2 y_2 - x_1 y_1^4 - 2 x_2 y_2^4 \\ s.t. & (y_1 - 10)^2 + (y_2 - 10)^2 \le 25 \end{array}\]

function solve_lower_level(x...)
    model = Model(Ipopt.Optimizer)
    set_silent(model)
    @variable(model, y[1:2])
    @objective(
        model,
        Max,
        x[1]^2 * y[1] + x[2]^2 * y[2] - x[1] * y[1]^4 - 2 * x[2] * y[2]^4,
    )
    @constraint(model, (y[1] - 10)^2 + (y[2] - 10)^2 <= 25)
    optimize!(model)
    assert_is_solved_and_feasible(model)
    return objective_value(model), value.(y)
end

solve_lower_level (generic function with 1 method)

The next function takes a value of $x$ and returns the optimal lower-level objective-value and the optimal response $y$. The reason why we need both the objective and the optimal $y$ will be made clear shortly, but for now let us define:

function V(x...)
    f, _ = solve_lower_level(x...)
    return f
end

V (generic function with 1 method)

Then, we can substitute $V$ into our full problem to create:

\[\begin{array}{r l} \min\limits_{x} & x_1^2 + x_2^2 + V(x_1, x_2) \\ s.t. & x \ge 0. \end{array}\]

This looks like a nonlinear optimization problem with a user-defined operator $V$! However, because $V$ solves an optimization problem internally, we can't use automatic differentiation to compute the first and second derivatives. Instead, we can use JuMP's ability to pass callback functions for the gradient and Hessian instead.

First up, we need to define the gradient of $V$ with respect to $x$. In general, this may be difficult to compute, but because $x$ appears only in the objective, we can just differentiate the objective function with respect to $x$, giving:

function ∇V(g::AbstractVector, x...)
    _, y = solve_lower_level(x...)
    g[1] = 2 * x[1] * y[1] - y[1]^4
    g[2] = 2 * x[2] * y[2] - 2 * y[2]^4
    return
end

∇V (generic function with 1 method)

Second, we need to define the Hessian of $V$ with respect to $x$. This is a symmetric matrix, but in our example only the diagonal elements are non-zero:

function ∇²V(H::AbstractMatrix, x...)
    _, y = solve_lower_level(x...)
    H[1, 1] = 2 * y[1]
    H[2, 2] = 2 * y[2]
    return
end

∇²V (generic function with 1 method)

Info

Providing an explicit Hessian function is optional if first derivatives are already available.

We now have enough to define our bilevel optimization problem:

model = Model(Ipopt.Optimizer)
@variable(model, x[1:2] >= 0)
@operator(model, op_V, 2, V, ∇V, ∇²V)
@objective(model, Min, x[1]^2 + x[2]^2 + op_V(x[1], x[2]))
optimize!(model)
assert_is_solved_and_feasible(model)
solution_summary(model)

solution_summary(; result = 1, verbose = false)
├ solver_name          : Ipopt
├ Termination
│ ├ termination_status : LOCALLY_SOLVED
│ ├ result_count       : 1
│ └ raw_status         : Solve_Succeeded
├ Solution (result = 1)
│ ├ primal_status        : FEASIBLE_POINT
│ ├ dual_status          : FEASIBLE_POINT
│ ├ objective_value      : -4.18983e+05
│ └ dual_objective_value : 0.00000e+00
└ Work counters
  ├ solve_time (sec)   : 5.14062e-01
  └ barrier_iterations : 32

The optimal objective value is:

objective_value(model)

-418983.48680640775

and the optimal upper-level decision variables $x$ are:

value.(x)

2-element Vector{Float64}:
 154.97862337234338
 180.0096143098799

To compute the optimal lower-level decision variables, we need to call solve_lower_level with the optimal upper-level decision variables:

_, y = solve_lower_level(value.(x)...)
y

2-element Vector{Float64}:
 7.072593961143734
 5.94656989283847

Improving performance

Our solution approach works, but it has a performance problem: every time we need to compute the value, gradient, or Hessian of $V$, we have to re-solve the lower-level optimization problem. This is wasteful, because we will often call the gradient and Hessian at the same point, and so solving the problem twice with the same input repeats work unnecessarily.

We can work around this by using a cache:

mutable struct Cache
    x::Any
    f::Float64
    y::Vector{Float64}
end

with a function to update the cache if needed:

function _update_if_needed(cache::Cache, x...)
    if cache.x !== x
        cache.f, cache.y = solve_lower_level(x...)
        cache.x = x
    end
    return
end

_update_if_needed (generic function with 1 method)

Then, we define cached versions of out three functions which call _updated_if_needed and return values from the cache.

function cached_f(cache::Cache, x...)
    _update_if_needed(cache, x...)
    return cache.f
end

function cached_∇f(cache::Cache, g::AbstractVector, x...)
    _update_if_needed(cache, x...)
    g[1] = 2 * x[1] * cache.y[1] - cache.y[1]^4
    g[2] = 2 * x[2] * cache.y[2] - 2 * cache.y[2]^4
    return
end

function cached_∇²f(cache::Cache, H::AbstractMatrix, x...)
    _update_if_needed(cache, x...)
    H[1, 1] = 2 * cache.y[1]
    H[2, 2] = 2 * cache.y[2]
    return
end

cached_∇²f (generic function with 1 method)

Now we're ready to setup and solve the upper level optimization problem:

model = Model(Ipopt.Optimizer)
@variable(model, x[1:2] >= 0)
cache = Cache(Float64[], NaN, Float64[])
@operator(
    model,
    op_cached_f,
    2,
    (x...) -> cached_f(cache, x...),
    (g, x...) -> cached_∇f(cache, g, x...),
    (H, x...) -> cached_∇²f(cache, H, x...),
)
@objective(model, Min, x[1]^2 + x[2]^2 + op_cached_f(x[1], x[2]))
optimize!(model)
assert_is_solved_and_feasible(model)
solution_summary(model)

solution_summary(; result = 1, verbose = false)
├ solver_name          : Ipopt
├ Termination
│ ├ termination_status : LOCALLY_SOLVED
│ ├ result_count       : 1
│ └ raw_status         : Solve_Succeeded
├ Solution (result = 1)
│ ├ primal_status        : FEASIBLE_POINT
│ ├ dual_status          : FEASIBLE_POINT
│ ├ objective_value      : -4.18983e+05
│ └ dual_objective_value : 0.00000e+00
└ Work counters
  ├ solve_time (sec)   : 1.92469e-01
  └ barrier_iterations : 32

an we can check we get the same objective value:

objective_value(model)

-418983.48680640775

and upper-level decision variable $x$:

value.(x)

2-element Vector{Float64}:
 154.97862337234338
 180.0096143098799