Nested optimization problems
This tutorial was generated using Literate.jl. Download the source as a .jl
file.
The purpose of this tutorial is to show how to solve a nested optimization problem, where an upper problem uses the results from the optimization of a lower subproblem.
To model the problem, we define a user-defined operator to handle the decomposition of the lower problem inside the upper one. Finally, we show how to improve the performance by using a cache that avoids resolving the lower problem.
For a simpler example of writing a user-defined operator, see the User-defined Hessians tutorial.
The JuMP extension BilevelJuMP.jl can also be used to model and solve bilevel optimization problems.
Required packages
This tutorial uses the following packages:
using JuMP
import Ipopt
Formulation
In the rest of this tutorial, our goal is to solve the bilevel optimization problem:
\[\begin{array}{r l} \min\limits_{x,z} & x_1^2 + x_2^2 + z \\ s.t. & \begin{array}{r l} z = \max\limits_{y} & x_1^2 y_1 + x_2^2 y_2 - x_1 y_1^4 - 2 x_2 y_2^4 \\ s.t. & (y_1 - 10)^2 + (y_2 - 10)^2 \le 25 \end{array} \\ & x \ge 0. \end{array}\]
This bilevel optimization problem is composed of two nested optimization problems. An upper level, involving variables $x$, and a lower level, involving variables $y$. From the perspective of the lower-level problem, the values of $x$ are fixed parameters, and so the model optimizes $y$ given those fixed parameters. Simultaneously, the upper-level problem optimizes $x$ and $z$ given the response of $y$.
Decomposition
There are a few ways to solve this problem, but we are going to use a nonlinear decomposition method. The first step is to write a function to compute the lower-level problem:
\[\begin{array}{r l} V(x_1, x_2) = \max\limits_{y} & x_1^2 y_1 + x_2^2 y_2 - x_1 y_1^4 - 2 x_2 y_2^4 \\ s.t. & (y_1 - 10)^2 + (y_2 - 10)^2 \le 25 \end{array}\]
function solve_lower_level(x...)
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, y[1:2])
@objective(
model,
Max,
x[1]^2 * y[1] + x[2]^2 * y[2] - x[1] * y[1]^4 - 2 * x[2] * y[2]^4,
)
@constraint(model, (y[1] - 10)^2 + (y[2] - 10)^2 <= 25)
optimize!(model)
@assert is_solved_and_feasible(model)
return objective_value(model), value.(y)
end
solve_lower_level (generic function with 1 method)
The next function takes a value of $x$ and returns the optimal lower-level objective-value and the optimal response $y$. The reason why we need both the objective and the optimal $y$ will be made clear shortly, but for now let us define:
function V(x...)
f, _ = solve_lower_level(x...)
return f
end
V (generic function with 1 method)
Then, we can substitute $V$ into our full problem to create:
\[\begin{array}{r l} \min\limits_{x} & x_1^2 + x_2^2 + V(x_1, x_2) \\ s.t. & x \ge 0. \end{array}\]
This looks like a nonlinear optimization problem with a user-defined operator $V$! However, because $V$ solves an optimization problem internally, we can't use automatic differentiation to compute the first and second derivatives. Instead, we can use JuMP's ability to pass callback functions for the gradient and Hessian instead.
First up, we need to define the gradient of $V$ with respect to $x$. In general, this may be difficult to compute, but because $x$ appears only in the objective, we can just differentiate the objective function with respect to $x$, giving:
function ∇V(g::AbstractVector, x...)
_, y = solve_lower_level(x...)
g[1] = 2 * x[1] * y[1] - y[1]^4
g[2] = 2 * x[2] * y[2] - 2 * y[2]^4
return
end
∇V (generic function with 1 method)
Second, we need to define the Hessian of $V$ with respect to $x$. This is a symmetric matrix, but in our example only the diagonal elements are non-zero:
function ∇²V(H::AbstractMatrix, x...)
_, y = solve_lower_level(x...)
H[1, 1] = 2 * y[1]
H[2, 2] = 2 * y[2]
return
end
∇²V (generic function with 1 method)
Providing an explicit Hessian function is optional if first derivatives are already available.
We now have enough to define our bilevel optimization problem:
model = Model(Ipopt.Optimizer)
@variable(model, x[1:2] >= 0)
@operator(model, op_V, 2, V, ∇V, ∇²V)
@objective(model, Min, x[1]^2 + x[2]^2 + op_V(x[1], x[2]))
optimize!(model)
@assert is_solved_and_feasible(model)
solution_summary(model)
* Solver : Ipopt
* Status
Result count : 1
Termination status : LOCALLY_SOLVED
Message from the solver:
"Solve_Succeeded"
* Candidate solution (result #1)
Primal status : FEASIBLE_POINT
Dual status : FEASIBLE_POINT
Objective value : -4.18983e+05
Dual objective value : 0.00000e+00
* Work counters
Solve time (sec) : 4.80948e-01
Barrier iterations : 32
The optimal objective value is:
objective_value(model)
-418983.48680640775
and the optimal upper-level decision variables $x$ are:
value.(x)
2-element Vector{Float64}:
154.97862337234338
180.0096143098799
To compute the optimal lower-level decision variables, we need to call solve_lower_level
with the optimal upper-level decision variables:
_, y = solve_lower_level(value.(x)...)
y
2-element Vector{Float64}:
7.072593961143734
5.94656989283847
Improving performance
Our solution approach works, but it has a performance problem: every time we need to compute the value, gradient, or Hessian of $V$, we have to re-solve the lower-level optimization problem. This is wasteful, because we will often call the gradient and Hessian at the same point, and so solving the problem twice with the same input repeats work unnecessarily.
We can work around this by using a cache:
mutable struct Cache
x::Any
f::Float64
y::Vector{Float64}
end
with a function to update the cache if needed:
function _update_if_needed(cache::Cache, x...)
if cache.x !== x
cache.f, cache.y = solve_lower_level(x...)
cache.x = x
end
return
end
_update_if_needed (generic function with 1 method)
Then, we define cached versions of out three functions which call _updated_if_needed
and return values from the cache.
function cached_f(cache::Cache, x...)
_update_if_needed(cache, x...)
return cache.f
end
function cached_∇f(cache::Cache, g::AbstractVector, x...)
_update_if_needed(cache, x...)
g[1] = 2 * x[1] * cache.y[1] - cache.y[1]^4
g[2] = 2 * x[2] * cache.y[2] - 2 * cache.y[2]^4
return
end
function cached_∇²f(cache::Cache, H::AbstractMatrix, x...)
_update_if_needed(cache, x...)
H[1, 1] = 2 * cache.y[1]
H[2, 2] = 2 * cache.y[2]
return
end
cached_∇²f (generic function with 1 method)
Now we're ready to setup and solve the upper level optimization problem:
model = Model(Ipopt.Optimizer)
@variable(model, x[1:2] >= 0)
cache = Cache(Float64[], NaN, Float64[])
@operator(
model,
op_cached_f,
2,
(x...) -> cached_f(cache, x...),
(g, x...) -> cached_∇f(cache, g, x...),
(H, x...) -> cached_∇²f(cache, H, x...),
)
@objective(model, Min, x[1]^2 + x[2]^2 + op_cached_f(x[1], x[2]))
optimize!(model)
@assert is_solved_and_feasible(model)
solution_summary(model)
* Solver : Ipopt
* Status
Result count : 1
Termination status : LOCALLY_SOLVED
Message from the solver:
"Solve_Succeeded"
* Candidate solution (result #1)
Primal status : FEASIBLE_POINT
Dual status : FEASIBLE_POINT
Objective value : -4.18983e+05
Dual objective value : 0.00000e+00
* Work counters
Solve time (sec) : 1.94901e-01
Barrier iterations : 32
an we can check we get the same objective value:
objective_value(model)
-418983.48680640775
and upper-level decision variable $x$:
value.(x)
2-element Vector{Float64}:
154.97862337234338
180.0096143098799