Benders decomposition
This tutorial was originally contributed by Shuvomoy Das Gupta.
This tutorial describes how to implement Benders decomposition in JuMP. It uses the following packages:
using JuMP
import GLPK
import Printf
Theory
Benders decomposition is a useful algorithm for solving convex optimization problems with a large number of variables. It works best when a larger problem can be decomposed into two (or more) smaller problems that are individually much easier to solve. This tutorial demonstrates Benders decomposition on the following mixed-integer linear program:
\[\begin{aligned} \text{min} \ & c_1^\top x+c_2^\top y \\ \text{subject to} \ & A_1 x+ A_2 y \le b \\ & x \ge 0 \\ & y \ge 0 \\ & x \in \mathbb{Z}^n \end{aligned}\]
where $b \in \mathbb{R}^m$, $A_1 \in \mathbb{R}^{m \times n}$, $A_2 \in \mathbb{R}^{m \times p}$ and $\mathbb{Z}$ is the set of integers.
Any mixed integer programming problem can be written in the form above.
If there are relatively few integer variables, and many more continuous variables, then it may be beneficial to decompose the problem into a small problem containing only integer variables and a linear program containing only continuous variables. Hopefully, the linear program will be much easier to solve in isolation than in the full mixed-integer linear program.
For example, if we knew a feasible solution for $x$, we could obtain a solution for $y$ by solving:
\[\begin{aligned} V_2(x) = & \text{min} \ & c_2^\top y \\ & \text{subject to} \ & A_2 y \le b - A_1 x & \quad [\pi] \\ & & y \ge 0, \end{aligned}\]
where $\pi$ is the dual variable associated with the constraints. Because this is a linear program, it is easy to solve.
Replacing the $c_2^\top y$ component of the objective in our original problem with $V_2$ yields:
\[\begin{aligned} \text{min} \ & c_1^\top x + V_2(x) \\ \text{subject to} \ & x \ge 0 \\ & x \in \mathbb{Z}^n \end{aligned}\]
This problem looks a lot simpler to solve, but we need to do something else with $V_2$ first.
Because $x$ is a constant that appears on the right-hand side of the constraints, $V_2$ is a convex function with respect to $x$, and the dual variable $\pi$ can be multiplied by $-A_1$ to obtain a subgradient of $V_2(x)$ with respect to $x$. Therefore, if we have a candidate solution $x_k$, then we can solve $V_2(x_k)$ and obtain a feasible dual vector $\pi_k$. Using these values, we can construct a first-order Taylor-series approximation of $V_2$ about the point $x_k$:
\[V_2(x) \ge V_2(x_k) + -\pi_k^\top A_1 (x - x_k).\]
By convexity, we know that this inequality holds for all $x$, and we call these inequalities cuts.
Benders decomposition is an iterative technique that replaces $V_2(x)$ with a new decision variable $\theta$, and approximates it from below using cuts:
\[\begin{aligned} V_1^K = & \text{min} \ & c_1^\top x + \theta \\ & \text{subject to} \ & x \ge 0 \\ & \ & x \in \mathbb{Z}^n \\ & \ & \theta \ge M \\ & \ & \theta \ge V_2(x_k) + \pi_k^\top(x - x_k) & \quad \forall k = 1,\ldots,K. \end{aligned}\]
This integer program is called the first-stage subproblem.
To generate cuts, we solve $V_1^K$ to obtain a candidate first-stage solution $x_k$, then we use that solution to solve $V_2(x_k)$. Then, using the optimal objective value and dual solution from $V_2$, we add a new cut to form $V_1^{K+1}$ and repeat.
Bounds
Due to convexity, we know that $V_2(x) \ge \theta$ for all $x$. Therefore, the optimal objective value of $V_1^K$ provides a valid lower bound on the objective value of the full problem. In addition, if we take a feasible solution for $x$ from the first-stage problem, then $c_1^\top x + V_2(x)$ is a valid upper bound on the objective value of the full problem.
Benders decomposition uses the lower and upper bounds to determine when it has found the global optimal solution.
Input data
As an example for this tutorial, we use the input data is from page 139 of Garfinkel, R. & Nemhauser, G. L. Integer programming. (Wiley, 1972).
c_1 = [1, 4]
c_2 = [2, 3]
dim_x = length(c_1)
dim_y = length(c_2)
b = [-2; -3]
A_1 = [1 -3; -1 -3]
A_2 = [1 -2; -1 -1]
M = -1000;
Iterative method
This is a basic implementation for pedagogical purposes. We haven't discussed Benders feasibility cuts, or any of the computational tricks that are required to build a performant implementation for large-scale problems. See In-place iterative method for one improvement that helps computation time.
We start by formulating the first-stage subproblem:
model = Model(GLPK.Optimizer)
@variable(model, x[1:dim_x] >= 0, Int)
@variable(model, θ >= M)
@objective(model, Min, c_1' * x + θ)
print(model)
Min x[1] + 4 x[2] + θ
Subject to
x[1] ≥ 0
x[2] ≥ 0
θ ≥ -1000
x[1] integer
x[2] integer
For the next step, we need a function that takes a first-stage candidate solution x
and returns the optimal solution from the second-stage subproblem:
function solve_subproblem(x)
model = Model(GLPK.Optimizer)
@variable(model, y[1:dim_y] >= 0)
con = @constraint(model, A_2 * y .<= b - A_1 * x)
@objective(model, Min, c_2' * y)
optimize!(model)
@assert termination_status(model) == OPTIMAL
return (obj = objective_value(model), y = value.(y), π = dual.(con))
end
solve_subproblem (generic function with 1 method)
Note that solve_subproblem
returns a NamedTuple
of the objective value, the optimal primal solution for y
, and the optimal dual solution for π
.
We're almost ready for our optimization loop, but first, here's a helpful function for logging:
function print_iteration(k, args...)
f(x) = Printf.@sprintf("%12.4e", x)
println(lpad(k, 9), " ", join(f.(args), " "))
return
end
print_iteration (generic function with 1 method)
We also need to put a limit on the number of iterations before termination:
MAXIMUM_ITERATIONS = 100
100
And a way to check if the lower and upper bounds are close-enough to terminate:
ABSOLUTE_OPTIMALITY_GAP = 1e-6
1.0e-6
Now we're ready to iterate Benders decomposition:
println("Iteration Lower Bound Upper Bound Gap")
for k in 1:MAXIMUM_ITERATIONS
optimize!(model)
lower_bound = objective_value(model)
x_k = value.(x)
ret = solve_subproblem(x_k)
upper_bound = c_1' * x_k + ret.obj
gap = (upper_bound - lower_bound) / upper_bound
print_iteration(k, lower_bound, upper_bound, gap)
if gap < ABSOLUTE_OPTIMALITY_GAP
println("Terminating with the optimal solution")
break
end
cut = @constraint(model, θ >= ret.obj + -ret.π' * A_1 * (x .- x_k))
@info "Adding the cut $(cut)"
end
Iteration Lower Bound Upper Bound Gap
1 -1.0000e+03 7.6667e+00 1.3143e+02
[ Info: Adding the cut 2 x[1] + 8 x[2] + θ ≥ 7.666666666666666
2 -4.9600e+02 1.2630e+03 1.3927e+00
[ Info: Adding the cut -1.5 x[1] + 4.5 x[2] + θ ≥ 3
3 -1.0800e+02 8.8800e+02 1.1216e+00
[ Info: Adding the cut θ ≥ 0
4 4.0000e+00 4.0000e+00 0.0000e+00
Terminating with the optimal solution
Finally, we can obtain the optimal solution
optimize!(model)
x_optimal = value.(x)
2-element Vector{Float64}:
0.0
1.0
optimal_ret = solve_subproblem(x_optimal)
y_optimal = optimal_ret.y
2-element Vector{Float64}:
0.0
0.0
Callback method
The Iterative method section implemented Benders decomposition using a loop. In each iteration, we re-solved the first-stage subproblem to generate a candidate solution. However, modern MILP solvers such as CPLEX, Gurobi, and GLPK provide lazy constraint callbacks which allow us to add new cuts while the solver is running. This can be more efficient than an iterative method because we can avoid repeating work such as solving the root node of the first-stage MILP at each iteration.
For more information on callbacks, read the page Solver-independent callbacks.
As before, we construct the same first-stage subproblem:
lazy_model = Model(GLPK.Optimizer)
@variable(lazy_model, x[1:dim_x] >= 0, Int)
@variable(lazy_model, θ >= M)
@objective(lazy_model, Min, θ)
print(lazy_model)
Min θ
Subject to
x[1] ≥ 0
x[2] ≥ 0
θ ≥ -1000
x[1] integer
x[2] integer
What differs is that we write a callback function instead of a loop:
k = 0
"""
my_callback(cb_data)
A callback that implements Benders decomposition. Note how similar it is to the
inner loop of the iterative method.
"""
function my_callback(cb_data)
global k += 1
x_k = callback_value.(cb_data, x)
θ_k = callback_value(cb_data, θ)
lower_bound = c_1' * x_k + θ_k
ret = solve_subproblem(x_k)
upper_bound = c_1' * x_k + c_2' * ret.y
gap = (upper_bound - lower_bound) / upper_bound
print_iteration(k, lower_bound, upper_bound, gap)
if gap < ABSOLUTE_OPTIMALITY_GAP
println("Terminating with the optimal solution")
return
end
cut = @build_constraint(θ >= ret.obj + -ret.π' * A_1 * (x .- x_k))
MOI.submit(model, MOI.LazyConstraint(cb_data), cut)
return
end
set_attribute(lazy_model, MOI.LazyConstraintCallback(), my_callback)
Now when we optimize!, our callback is run:
optimize!(lazy_model)
1 -1.0000e+03 7.6667e+00 1.3143e+02
2 -4.9617e+02 5.0383e+02 1.9848e+00
3 3.8333e+00 4.0833e+00 6.1224e-02
4 4.0000e+00 4.0000e+00 0.0000e+00
Terminating with the optimal solution
Note how this problem also takes 4 iterations to converge, but the sequence of bounds is different compared to the iterative method.
Finally, we can obtain the optimal solution:
x_optimal = value.(x)
2-element Vector{Float64}:
0.0
1.0
optimal_ret = solve_subproblem(x_optimal)
y_optimal = optimal_ret.y
2-element Vector{Float64}:
0.0
0.0
In-place iterative method
Our implementation of the iterative method has a problem: every time we need to solve the subproblem, we must rebuild it from scratch. This is expensive, and it can be the bottleneck in the solution process. We can improve our implementation by using re-using the subproblem between solves.
First, we create our first-stage problem as usual:
model = Model(GLPK.Optimizer)
@variable(model, x[1:dim_x] >= 0, Int)
@variable(model, θ >= M)
@objective(model, Min, c_1' * x + θ)
print(model)
Min x[1] + 4 x[2] + θ
Subject to
x[1] ≥ 0
x[2] ≥ 0
θ ≥ -1000
x[1] integer
x[2] integer
Then, instead of building the subproblem in a function, we build it once here:
subproblem = Model(GLPK.Optimizer)
@variable(subproblem, x_copy[1:dim_x])
@variable(subproblem, y[1:dim_y] >= 0)
@constraint(subproblem, A_1 * x_copy + A_2 * y .<= b)
@objective(subproblem, Min, c_2' * y)
print(subproblem)
Min 2 y[1] + 3 y[2]
Subject to
x_copy[1] - 3 x_copy[2] + y[1] - 2 y[2] ≤ -2
-x_copy[1] - 3 x_copy[2] - y[1] - y[2] ≤ -3
y[1] ≥ 0
y[2] ≥ 0
This formulation is slightly different. We have included a copy of the x variables, x_copy
, and used x_copy
in the left-hand side of the constraints.
Our function to solve the subproblem is also slightly different. First, we need to fix the value of the x_copy
variables to the value of x
from the first-stage problem, and second, we compute the dual using the reduced_cost
of x_copy
, not the dual of the linear constraints:
function solve_subproblem(model, x)
fix.(model[:x_copy], x)
optimize!(model)
@assert termination_status(model) == OPTIMAL
return (
obj = objective_value(model),
y = value.(model[:y]),
π = reduced_cost.(model[:x_copy]),
)
end
solve_subproblem (generic function with 2 methods)
Now we're ready to iterate our in-place Benders decomposition, but this time the cut computation is slightly different. Because we used reduced_cost
on the x_copy
variables, the value of π
is a valid subgradient on the objective of subproblem
with respect to x
. Therefore, we don't need to multiply the duals by -A_1
, and so our cut uses ret.π'
instead of -ret.π' * A_1
:
println("Iteration Lower Bound Upper Bound Gap")
for k in 1:MAXIMUM_ITERATIONS
optimize!(model)
lower_bound = objective_value(model)
x_k = value.(x)
ret = solve_subproblem(subproblem, x_k)
upper_bound = c_1' * x_k + ret.obj
gap = (upper_bound - lower_bound) / upper_bound
print_iteration(k, lower_bound, upper_bound, gap)
if gap < ABSOLUTE_OPTIMALITY_GAP
println("Terminating with the optimal solution")
break
end
cut = @constraint(model, θ >= ret.obj + ret.π' * (x .- x_k))
@info "Adding the cut $(cut)"
end
Iteration Lower Bound Upper Bound Gap
1 -1.0000e+03 7.6667e+00 1.3143e+02
[ Info: Adding the cut 2 x[1] + 8 x[2] + θ ≥ 7.666666666666666
2 -4.9600e+02 1.2630e+03 1.3927e+00
[ Info: Adding the cut -1.5 x[1] + 4.5 x[2] + θ ≥ 3
3 -1.0800e+02 8.8800e+02 1.1216e+00
[ Info: Adding the cut θ ≥ 0
4 4.0000e+00 4.0000e+00 0.0000e+00
Terminating with the optimal solution
Finally, we can obtain the optimal solution:
optimize!(model)
x_optimal = value.(x)
2-element Vector{Float64}:
0.0
1.0
optimal_ret = solve_subproblem(subproblem, x_optimal)
y_optimal = optimal_ret.y
2-element Vector{Float64}:
0.0
0.0
For larger problems, the benefit of re-using the same subproblem and not needing to multiply the duals by A_1
in the cut coefficient usually outweights the cost of needing a copy of the x
variables in the subproblem.
As a secondary benefit, because we no longer need an explicit representation of A_1
in the cut, we can build the model
and subproblem
formulations using arbitrary JuMP syntax; they do not need to be in matrix form.
This tutorial was generated using Literate.jl. View the source .jl
file on GitHub.