Computing Hessians
The purpose of this tutorial is to demonstrate how to compute the Hessian of the Lagrangian of a nonlinear program.
This is an advanced tutorial that interacts with the low-level nonlinear interface of MathOptInterface.
By default, JuMP exports the MOI
symbol as an alias for the MathOptInterface.jl package. We recommend making this more explicit in your code by adding the following lines:
import MathOptInterface as MOI
Given a nonlinear program:
\[\begin{align} & \min_{x \in \mathbb{R}^n} & f(x) \\ & \;\;\text{s.t.} & l \le g_i(x) \le u \end{align}\]
the Hessian of the Lagrangian is computed as:
\[H(x, \sigma, \mu) = \sigma\nabla^2 f(x) + \sum_{i=1}^m \mu_i \nabla^2 g_i(x)\]
where $x$ is a primal point, $\sigma$ is a scalar (typically $1$), and $\mu$ is a vector of weights corresponding to the Lagrangian dual of the constraints.
This tutorial uses the following packages:
using JuMP
import Ipopt
import LinearAlgebra
import Random
import SparseArrays
The basic model
To demonstrate how to interact with the lower-level nonlinear interface, we need an example model. The exact model isn't important; we use the model from The Rosenbrock function tutorial, with some additional constraints to demonstrate various features of the lower-level interface.
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x[i = 1:2], start = -i)
@constraint(model, g_1, x[1]^2 <= 1)
@NLconstraint(model, g_2, (x[1] + x[2])^2 <= 2)
@NLobjective(model, Min, (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2)
optimize!(model)
The analytic solution
With a little work, it is possible to analytically derive the correct hessian:
function analytic_hessian(x, σ, μ)
g_1_H = [2.0 0.0; 0.0 0.0]
g_2_H = [2.0 2.0; 2.0 2.0]
f_H = zeros(2, 2)
f_H[1, 1] = 2.0 + 1200.0 * x[1]^2 - 400.0 * x[2]
f_H[1, 2] = f_H[2, 1] = -400.0 * x[1]
f_H[2, 2] = 200.0
return σ * f_H + μ' * [g_1_H, g_2_H]
end
analytic_hessian (generic function with 1 method)
Here are various points:
analytic_hessian([1, 1], 0, [0, 0])
2×2 Matrix{Float64}:
0.0 0.0
0.0 0.0
analytic_hessian([1, 1], 0, [1, 0])
2×2 Matrix{Float64}:
2.0 0.0
0.0 0.0
analytic_hessian([1, 1], 0, [0, 1])
2×2 Matrix{Float64}:
2.0 2.0
2.0 2.0
analytic_hessian([1, 1], 1, [0, 0])
2×2 Matrix{Float64}:
802.0 -400.0
-400.0 200.0
Initializing the NLPEvaluator
JuMP stores all information relating to the nonlinear portions of the model in a NLPEvaluator
struct:
d = NLPEvaluator(model)
Nonlinear.Evaluator with available features:
* :Grad
* :Jac
* :JacVec
* :Hess
* :HessVec
* :ExprGraph
Before computing anything with the NLPEvaluator, we need to initialize it. Use MOI.features_available
to see what we can query:
MOI.features_available(d)
6-element Vector{Symbol}:
:Grad
:Jac
:JacVec
:Hess
:HessVec
:ExprGraph
Consult the MOI documentation for specifics. But to obtain the Hessian matrix, we need to initialize :Hess
:
MOI.initialize(d, [:Hess])
MOI represents the Hessian as a sparse matrix. Get the sparsity pattern as follows:
hessian_sparsity = MOI.hessian_lagrangian_structure(d)
6-element Vector{Tuple{Int64, Int64}}:
(1, 1)
(2, 2)
(2, 1)
(1, 1)
(2, 2)
(2, 1)
The sparsity pattern has a few properties of interest:
- Each element
(i, j)
indicates a structural non-zero in rowi
and columnj
- The list may contain duplicates, in which case we should add the values together
- The list does not need to be sorted
- The list may contain any mix of lower- or upper-triangular indices
This format matches Julia's sparse-triplet form of a SparseArray, so we can convert from the sparse Hessian representation to a Julia SparseArray as follows:
I = [i for (i, _) in hessian_sparsity]
J = [j for (_, j) in hessian_sparsity]
V = zeros(length(hessian_sparsity))
n = num_variables(model)
H = SparseArrays.sparse(I, J, V, n, n)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
0.0 ⋅
0.0 0.0
Of course, knowing where the zeros are isn't very interesting. We really want to compute the value of the Hessian matrix at a point.
num_g = num_nonlinear_constraints(model)
MOI.eval_hessian_lagrangian(d, V, ones(n), 1.0, ones(num_g))
H = SparseArrays.sparse(I, J, V, n, n)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
804.0 ⋅
-398.0 202.0
In practice, we often want to compute the value of the hessian at the optimal solution.
First, we compute the primal solution. To do so, we need a vector of the variables in the order that they were passed to the solver:
x = all_variables(model)
2-element Vector{VariableRef}:
x[1]
x[2]
Here x[1]
is the variable that corresponds to column 1, and so on. Here's the optimal primal solution:
x_optimal = value.(x)
2-element Vector{Float64}:
0.7903587551555908
0.6238546250475736
Next, we need the optimal dual solution associated with the nonlinear constraints:
y_optimal = dual.(all_nonlinear_constraints(model))
1-element Vector{Float64}:
-0.057440893636909414
Now we can compute the Hessian at the optimal primal-dual point:
MOI.eval_hessian_lagrangian(d, V, x_optimal, 1.0, y_optimal)
H = SparseArrays.sparse(I, J, V, n, n)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
501.944 ⋅
-316.258 199.885
However, this Hessian isn't quite right because it isn't symmetric. We can fix this by filling in the appropriate off-diagonal terms:
function fill_off_diagonal(H)
ret = H + H'
row_vals = SparseArrays.rowvals(ret)
non_zeros = SparseArrays.nonzeros(ret)
for col in 1:size(ret, 2)
for i in SparseArrays.nzrange(ret, col)
if col == row_vals[i]
non_zeros[i] /= 2
end
end
end
return ret
end
fill_off_diagonal(H)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 4 stored entries:
501.944 -316.258
-316.258 199.885
Moreover, this Hessian only accounts for the objective and constraints entered using @NLobjective
and @NLconstraint
. If we want to take quadratic objectives and constraints written using @objective
or @constraint
into account, we'll need to handle them separately.
If you don't want to do this, you can replace calls to @objective
and @constraint
with @NLobjective
and @NLconstraint
.
Hessians from QuadExpr functions
To compute the hessian from a quadratic expression, let's see how JuMP represents a quadratic constraint:
f = constraint_object(g_1).func
\[ x_{1}^2 \]
f
is a quadratic expression of the form:
f(x) = Σqᵢⱼ * xᵢ * xⱼ + Σaᵢ xᵢ + c
So ∇²f(x)
is the matrix formed by [qᵢⱼ]ᵢⱼ
if i != j
and 2[qᵢⱼ]ᵢⱼ
if i = j
.
variables_to_column = Dict(x[i] => i for i in 1:n)
function add_to_hessian(H, f::QuadExpr, μ)
for (vars, coef) in f.terms
i = variables_to_column[vars.a]
j = variables_to_column[vars.b]
H[i, j] += μ * coef
end
return
end
add_to_hessian (generic function with 1 method)
If the function f
is not a QuadExpr
, do nothing because it is an AffExpr
or a VariableRef
. In both cases, the second derivative is zero.
add_to_hessian(H, f::Any, μ) = nothing
add_to_hessian (generic function with 2 methods)
Then we iterate over all constraints in the model and add their Hessian components:
for (F, S) in list_of_constraint_types(model)
for cref in all_constraints(model, F, S)
f = constraint_object(cref).func
add_to_hessian(H, f, dual(cref))
end
end
H
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
501.944 ⋅
-316.258 199.885
Finally, we need to take into account the objective function:
add_to_hessian(H, objective_function(model), 1.0)
fill_off_diagonal(H)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 4 stored entries:
501.944 -316.258
-316.258 199.885
Putting everything together:
function compute_optimal_hessian(model)
d = NLPEvaluator(model)
MOI.initialize(d, [:Hess])
hessian_sparsity = MOI.hessian_lagrangian_structure(d)
I = [i for (i, _) in hessian_sparsity]
J = [j for (_, j) in hessian_sparsity]
V = zeros(length(hessian_sparsity))
x = all_variables(model)
x_optimal = value.(x)
y_optimal = dual.(all_nonlinear_constraints(model))
MOI.eval_hessian_lagrangian(d, V, x_optimal, 1.0, y_optimal)
n = num_variables(model)
H = SparseArrays.sparse(I, J, V, n, n)
vmap = Dict(x[i] => i for i in 1:n)
add_to_hessian(H, f::Any, μ) = nothing
function add_to_hessian(H, f::QuadExpr, μ)
for (vars, coef) in f.terms
if vars.a != vars.b
H[vmap[vars.a], vmap[vars.b]] += μ * coef
else
H[vmap[vars.a], vmap[vars.b]] += 2 * μ * coef
end
end
end
for (F, S) in list_of_constraint_types(model)
for cref in all_constraints(model, F, S)
add_to_hessian(H, constraint_object(cref).func, dual(cref))
end
end
add_to_hessian(H, objective_function(model), 1.0)
return Matrix(fill_off_diagonal(H))
end
H_star = compute_optimal_hessian(model)
2×2 Matrix{Float64}:
501.944 -316.258
-316.258 199.885
If we compare our solution against the analytical solution:
analytic_hessian(value.(x), 1.0, dual.([g_1, g_2]))
2×2 Matrix{Float64}:
501.944 -316.258
-316.258 199.885
If we look at the eigenvalues of the Hessian:
LinearAlgebra.eigvals(H_star)
2-element Vector{Float64}:
0.44439959255495864
701.3843408744132
we see that they are all positive. Therefore, the Hessian is positive definite, and so the solution found by Ipopt is a local minimizer.
This tutorial was generated using Literate.jl. View the source .jl
file on GitHub.