Benders decomposition

Originally Contributed by: Shuvomoy Das Gupta

This tutorial describes how to implement Benders decomposition in JuMP. It uses the following packages:

using JuMP
import GLPK
import Printf

Theory

Benders decomposition is a useful algorithm for solving convex optimization problems with a large number of variables. It works best when a larger problem can be decomposed into two (or more) smaller problems that are invidually much easier to solve. This tutorial demonstrates Benders decomposition on the following mixed-integer linear program:

\[\begin{aligned} \text{min} \ & c_1^\top x+c_2^\top y \\ \text{subject to} \ & A_1 x+ A_2 y \le b \\ & x \ge 0 \\ & y \ge 0 \\ & x \in \mathbb{Z}^n \end{aligned}\]

where $b \in \mathbb{R}^m$, $A_1 \in \mathbb{R}^{m \times n}$, $A_2 \in \mathbb{R}^{m \times p}$ and $\mathbb{Z}$ is the set of integers.

Any mixed integer programming problem can be written in the form above.

If there are relatively few integer variables, and many more continuous variables, then it may be beneficial to decompose the problem into a small problem containing only integer variables and a linear program containing only continuous variables. Hopefully, the linear program will be much easier to solve in isolation than in the full mixed-integer linear program.

For example, if we knew a feasible solution for $x$, we could obtain a solution for $y$ by solving:

\[\begin{aligned} V_2(x) = & \text{min} \ & c_2^\top y \\ & \text{subject to} \ & A_2 y \le b - A_1 x & \quad [\pi] \\ & & y \ge 0, \end{aligned}\]

where $\pi$ is the dual variable associated with the constraints. Because this is a linear program, it is easy to solve.

Replacing the $c_2^\top y$ component of the objective in our original problem with $V_2$ yields:

\[\begin{aligned} \text{min} \ & c_1^\top x + V_2(x) \\ \text{subject to} \ & x \ge 0 \\ & x \in \mathbb{Z}^n \end{aligned}\]

This problem looks a lot simpler to solve, but we need to do something else with $V_2$ first.

Because $x$ is a constant that appears on the right-hand side of the constraints, $V_2$ is a convex function with respect to $x$, and the dual variable $\pi$ can be multiplied by $-A_1$ to obtain a subgradient of $V_2(x)$ with respect to $x$. Therefore, if we have a candidate solution $x_k$, then we can solve $V_2(x_k)$ and obtain a feasible dual vector $\pi_k$. Using these values, we can construct a first-order Taylor-series approximation of $V_2$ about the point $x_k$:

\[V_2(x) \ge V_2(x_k) + -\pi_k^\top A_1 (x - x_k).\]

By convexity, we know that this inequality holds for all $x$, and we call these inequalities cuts.

Benders decomposition is an iterative technique that replaces $V_2(x)$ with a new decision variable $\theta$, and approximates it from below using cuts:

\[\begin{aligned} V_1^K = & \text{min} \ & c_1^\top x + \theta \\ & \text{subject to} \ & x \ge 0 \\ & \ & x \in \mathbb{Z}^n \\ & \ & \theta \ge M \\ & \ & \theta \ge V_2(x_k) + \pi_k^\top(x - x_k) & \quad \forall k = 1,\ldots,K. \end{aligned}\]

This integer program is called the first-stage subproblem.

To generate cuts, we solve $V_1^K$ to obtain a candidate first-stage solution $x_k$, then we use that solution to solve $V_2(x_k)$. Then, using the optimal objective value and dual solution from $V_2$, we add a new cut to form $V_1^{K+1}$ and repeat.

Bounds

Due to convexity, we know that $V_2(x) \ge \theta$ for all $x$. Therefore, the optimal objective value of $V_1^K$ provides a valid lower bound on the objective value of the full problem. In addition, if we take a feasible solution for $x$ from the first-stage problem, then $c_1^\top x + V_2(x)$ is a valid upper bound on the objective value of the full problem.

Benders decomposition uses the lower and upper bounds to determine when it has found the global optimal solution.

Input data

As an example for this tutorial, we use the input data is from page 139 of Garfinkel, R. & Nemhauser, G. L. Integer programming. (Wiley, 1972).

c_1 = [1, 4]
c_2 = [2, 3]
dim_x = length(c_1)
dim_y = length(c_2)
b = [-2; -3]
A_1 = [1 -3; -1 -3]
A_2 = [1 -2; -1 -1]
M = -1000;

Iterative method

Warning

This is a basic implementation for pedagogical purposes. We haven't discussed Benders feasibility cuts, or any of the computational tricks that are required to build a performative implementation for large-scale problems.

We start by formulating the first-stage subproblem:

model = Model(GLPK.Optimizer)
@variable(model, x[1:dim_x] >= 0, Int)
@variable(model, θ >= M)
@objective(model, Min, c_1' * x + θ)
print(model)
Min x[1] + 4 x[2] + θ
Subject to
 x[1] ≥ 0.0
 x[2] ≥ 0.0
 θ ≥ -1000.0
 x[1] integer
 x[2] integer

FOr the next step, we need a function that takes a first-stage candidate solution x and returns the optimal solution from the second-stage subproblem:

function solve_subproblem(x)
    model = Model(GLPK.Optimizer)
    @variable(model, y[1:dim_y] >= 0)
    con = @constraint(model, A_2 * y .<= b - A_1 * x)
    @objective(model, Min, c_2' * y)
    optimize!(model)
    @assert termination_status(model) == OPTIMAL
    return (obj = objective_value(model), y = value.(y), π = dual.(con))
end
solve_subproblem (generic function with 1 method)

Note that solve_subproblem returns a NamedTuple of the objective value, the optimal primal solution for y, and the optimal dual solution for π.

We're almost ready for our optimization loop, but first, here's a helpful function for logging:

function print_iteration(k, args...)
    f(x) = Printf.@sprintf("%12.4e", x)
    println(lpad(k, 9), " ", join(f.(args), " "))
    return
end
print_iteration (generic function with 1 method)

We also need to put a limit on the number of iterations before termination:

MAXIMUM_ITERATIONS = 100
100

And a way to check if the lower and upper bounds are close-enough to terminate:

ABSOLUTE_OPTIMALITY_GAP = 1e-6
1.0e-6

Now we're ready to iterate Benders decomposition:

println("Iteration  Lower Bound  Upper Bound          Gap")
for k in 1:MAXIMUM_ITERATIONS
    optimize!(model)
    lower_bound = objective_value(model)
    x_k = value.(x)
    ret = solve_subproblem(x_k)
    upper_bound = c_1' * x_k + ret.obj
    gap = (upper_bound - lower_bound) / upper_bound
    print_iteration(k, lower_bound, upper_bound, gap)
    if gap < ABSOLUTE_OPTIMALITY_GAP
        println("Terminating with the optimal solution")
        break
    end
    cut = @constraint(model, θ >= ret.obj + -ret.π' * A_1 * (x .- x_k))
    @info "Adding the cut $(cut)"
end
Iteration  Lower Bound  Upper Bound          Gap
        1  -1.0000e+03   7.6667e+00   1.3143e+02
[ Info: Adding the cut 2 x[1] + 8 x[2] + θ ≥ 7.666666666666666
        2  -4.9600e+02   1.2630e+03   1.3927e+00
[ Info: Adding the cut -1.5 x[1] + 4.5 x[2] + θ ≥ 3.0
        3  -1.0800e+02   8.8800e+02   1.1216e+00
[ Info: Adding the cut θ ≥ 0.0
        4   4.0000e+00   4.0000e+00   0.0000e+00
Terminating with the optimal solution

Finally, we can obtain the optimal solution

optimize!(model)
x_optimal = value.(x)
2-element Vector{Float64}:
 0.0
 1.0
optimal_ret = solve_subproblem(x_optimal)
y_optimal = optimal_ret.y
2-element Vector{Float64}:
 0.0
 0.0

Callback method

The Iterative method section implemented Benders decomposition using a loop. In each iteration, we re-solved the first-stage subproblem to generate a candidate solution. However, modern MILP solvers such as CPLEX, Gurobi, and GLPK provide lazy constraint callbacks which allow us to add new cuts while the solver is running. This can be more efficient than an iterative method because we can avoid repeating work such as solving the root node of the first-stage MILP at each iteration.

Tip

For more information on callbacks, read the page Solver-independent callbacks.

As before, we construct the same first-stage subproblem:

lazy_model = Model(GLPK.Optimizer)
@variable(lazy_model, x[1:dim_x] >= 0, Int)
@variable(lazy_model, θ >= M)
@objective(lazy_model, Min, θ)
print(lazy_model)
Min θ
Subject to
 x[1] ≥ 0.0
 x[2] ≥ 0.0
 θ ≥ -1000.0
 x[1] integer
 x[2] integer

What differs is that we write a callback function instead of a loop:

k = 0

"""
    my_callback(cb_data)

A callback that implements Benders decomposition. Note how similar it is to the
inner loop of the iterative method.
"""
function my_callback(cb_data)
    global k += 1
    x_k = callback_value.(cb_data, x)
    θ_k = callback_value(cb_data, θ)
    lower_bound = c_1' * x_k + θ_k
    ret = solve_subproblem(x_k)
    upper_bound = c_1' * x_k + c_2' * ret.y
    gap = (upper_bound - lower_bound) / upper_bound
    print_iteration(k, lower_bound, upper_bound, gap)
    if gap < ABSOLUTE_OPTIMALITY_GAP
        println("Terminating with the optimal solution")
        return
    end
    cut = @build_constraint(θ >= ret.obj + -ret.π' * A_1 * (x .- x_k))
    MOI.submit(model, MOI.LazyConstraint(cb_data), cut)
    return
end

MOI.set(lazy_model, MOI.LazyConstraintCallback(), my_callback)

Now when we optimize!, our callback is run:

optimize!(lazy_model)
        1  -1.0000e+03   7.6667e+00   1.3143e+02
        2  -4.9617e+02   5.0383e+02   1.9848e+00
        3   3.8333e+00   4.0833e+00   6.1224e-02
        4   4.0000e+00   4.0000e+00   0.0000e+00
Terminating with the optimal solution

Note how this problem also takes 4 iterations to converge, but the sequence of bounds is different compared to the iterative method.

Finally, we can obtain the optimal solution:

x_optimal = value.(x)
2-element Vector{Float64}:
 0.0
 1.0
optimal_ret = solve_subproblem(x_optimal)
y_optimal = optimal_ret.y
2-element Vector{Float64}:
 0.0
 0.0

Tip

This tutorial was generated using Literate.jl. View the source .jl file on GitHub.