Computing Hessians

The purpose of this tutorial is to demonstrate how to compute the Hessian of the Lagrangian of a nonlinear program.

Warning

This is an advanced tutorial that interacts with the low-level nonlinear interface of MathOptInterface.

By default, JuMP exports the MOI symbol as an alias for the MathOptInterface.jl package. We recommend making this more explicit in your code by adding the following lines:

import MathOptInterface
const MOI = MathOptInterface

Given a nonlinear program:

\begin{align} & \min_{x \in \mathbb{R}^n} & f(x) \\ & \;\;\text{s.t.} & l \le g_i(x) \le u \end{align}

the Hessian of the Lagrangian is computed as:

$$$H(x, \sigma, \mu) = \sigma\nabla^2 f(x) + \sum_{i=1}^m \mu_i \nabla^2 g_i(x)$$$

where $x$ is a primal point, $\sigma$ is a scalar (typically $1$), and $\mu$ is a vector of weights corresponding to the Lagrangian dual of the constraints.

This tutorial uses the following packages:

using JuMP
import Ipopt
import LinearAlgebra
import Random
import SparseArrays

The basic model

To demonstrate how to interact with the lower-level nonlinear interface, we need an example model. The exact model isn't important; we use the model from The Rosenbrock function tutorial, with some additional constraints to demonstrate various features of the lower-level interface.

model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x[i = 1:2], start = -i)
@constraint(model, g_1, x[1]^2 <= 1)
@NLconstraint(model, g_2, (x[1] + x[2])^2 <= 2)
@NLobjective(model, Min, (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2)
optimize!(model)

The analytic solution

With a little work, it is possible to analytically derive the correct hessian:

function analytic_hessian(x, σ, μ)
g_1_H = [2.0 0.0; 0.0 0.0]
g_2_H = [2.0 2.0; 2.0 2.0]
f_H = zeros(2, 2)
f_H[1, 1] = 2.0 + 1200.0 * x[1]^2 - 400.0 * x[2]
f_H[1, 2] = f_H[2, 1] = -400.0 * x[1]
f_H[2, 2] = 200.0
return σ * f_H + μ' * [g_1_H, g_2_H]
end
analytic_hessian (generic function with 1 method)

Here are various points:

analytic_hessian([1, 1], 0, [0, 0])
2×2 Matrix{Float64}:
0.0  0.0
0.0  0.0
analytic_hessian([1, 1], 0, [1, 0])
2×2 Matrix{Float64}:
2.0  0.0
0.0  0.0
analytic_hessian([1, 1], 0, [0, 1])
2×2 Matrix{Float64}:
2.0  2.0
2.0  2.0
analytic_hessian([1, 1], 1, [0, 0])
2×2 Matrix{Float64}:
802.0  -400.0
-400.0   200.0

Initializing the NLPEvaluator

JuMP stores all information relating to the nonlinear portions of the model in a NLPEvaluator struct:

d = NLPEvaluator(model)
An NLPEvaluator with available features:
* :Jac
* :JacVec
* :ExprGraph
* :Hess
* :HessVec

Before computing anything with the NLPEvaluator, we need to initialize it. Use MOI.features_available to see what we can query:

MOI.features_available(d)
6-element Vector{Symbol}:
:Jac
:JacVec
:ExprGraph
:Hess
:HessVec

Consult the MOI documentation for specifics. But to obtain the Hessian matrix, we need to initialize :Hess:

MOI.initialize(d, [:Hess])

MOI represents the Hessian as a sparse matrix. Get the sparsity pattern as follows:

hessian_sparsity = MOI.hessian_lagrangian_structure(d)
6-element Vector{Tuple{Int64, Int64}}:
(1, 1)
(2, 2)
(2, 1)
(1, 1)
(2, 2)
(2, 1)

The sparsity pattern has a few properties of interest:

• Each element (i, j) indicates a structural non-zero in row i and column j
• The list may contain duplicates, in which case we should add the values together
• The list does not need to be sorted
• The list may contain any mix of lower- or upper-triangular indices

This format matches Julia's sparse-triplet form of a SparseArray, so we can convert from the sparse Hessian representation to a Julia SparseArray as follows:

I = [i for (i, _) in hessian_sparsity]
J = [j for (_, j) in hessian_sparsity]
V = zeros(length(hessian_sparsity))
n = num_variables(model)
H = SparseArrays.sparse(I, J, V, n, n)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
0.0   ⋅
0.0  0.0

Of course, knowing where the zeros are isn't very interesting. We really want to compute the value of the Hessian matrix at a point.

num_g = num_nl_constraints(model)
MOI.eval_hessian_lagrangian(d, V, ones(n), 1.0, ones(num_g))
H = SparseArrays.sparse(I, J, V, n, n)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
804.0     ⋅
-398.0  202.0

In practice, we often want to compute the value of the hessian at the optimal solution.

First, we compute the primal solution. To do so, we need a vector of the variables in the order that they were passed to the solver:

x = all_variables(model)
2-element Vector{VariableRef}:
x[1]
x[2]

Here x[1] is the variable that corresponds to column 1, and so on. Here's the optimal primal solution:

x_optimal = value.(x)
2-element Vector{Float64}:
0.7903587551555908
0.6238546250475736

Next, we need the optimal dual solution associated with the nonlinear constraints:

y_optimal = dual.(all_nl_constraints(model))
1-element Vector{Float64}:
-0.057440893636909414

Now we can compute the Hessian at the optimal primal-dual point:

MOI.eval_hessian_lagrangian(d, V, x_optimal, 1.0, y_optimal)
H = SparseArrays.sparse(I, J, V, n, n)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
501.944     ⋅
-316.258  199.885

However, this Hessian isn't quite right because it isn't symmetric. We can fix this by filling in the appropriate off-diagonal terms:

function fill_off_diagonal(H)
ret = H + H'
row_vals = SparseArrays.rowvals(ret)
non_zeros = SparseArrays.nonzeros(ret)
for col in 1:size(ret, 2)
for i in SparseArrays.nzrange(ret, col)
if col == row_vals[i]
non_zeros[i] /= 2
end
end
end
return ret
end

fill_off_diagonal(H)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 4 stored entries:
501.944  -316.258
-316.258   199.885

Moreover, this Hessian only accounts for the objective and constraints entered using @NLobjective and @NLconstraint. If we want to take quadratic objectives and constraints written using @objective or @constraint into account, we'll need to handle them separately.

Tip

If you don't want to do this, you can replace calls to @objective and @constraint with @NLobjective and @NLconstraint.

To compute the hessian from a quadratic expression, let's see how JuMP represents a quadratic constraint:

f = constraint_object(g_1).func
$$$x_{1}^2$$$

f is a quadratic expression of the form:

f(x) = Σqᵢⱼ * xᵢ * xⱼ + Σaᵢ xᵢ + c

So ∇²f(x) is the matrix formed by [qᵢⱼ]ᵢⱼ.

variables_to_column = Dict(x[i] => i for i in 1:n)

for (vars, coef) in f.terms
i = variables_to_column[vars.a]
j = variables_to_column[vars.b]
H[i, j] += μ * coef
end
return
end
add_to_hessian (generic function with 1 method)

If the function f is not a QuadExpr, do nothing because it is an AffExpr or a VariableRef. In both cases, the second derivative is zero.

add_to_hessian(H, f::Any, μ) = nothing
add_to_hessian (generic function with 2 methods)

Then we iterate over all constraints in the model and add their Hessian components:

for (F, S) in list_of_constraint_types(model)
for cref in all_constraints(model, F, S)
f = constraint_object(cref).func
end
end

H
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
501.944     ⋅
-316.258  199.885

Finally, we need to take into account the objective function:

add_to_hessian(H, objective_function(model), 1.0)

fill_off_diagonal(H)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 4 stored entries:
501.944  -316.258
-316.258   199.885

Putting everything together:

function compute_optimal_hessian(model)
d = NLPEvaluator(model)
MOI.initialize(d, [:Hess])
hessian_sparsity = MOI.hessian_lagrangian_structure(d)
I = [i for (i, _) in hessian_sparsity]
J = [j for (_, j) in hessian_sparsity]
V = zeros(length(hessian_sparsity))
x = all_variables(model)
x_optimal = value.(x)
y_optimal = dual.(all_nl_constraints(model))
MOI.eval_hessian_lagrangian(d, V, x_optimal, 1.0, y_optimal)
n = num_variables(model)
H = SparseArrays.sparse(I, J, V, n, n)
vmap = Dict(x[i] => i for i in 1:n)
for (vars, coef) in f.terms
H[vmap[vars.a], vmap[vars.b]] += μ * coef
end
end
for (F, S) in list_of_constraint_types(model)
for cref in all_constraints(model, F, S)
end
end
return Matrix(fill_off_diagonal(H))
end

H_star = compute_optimal_hessian(model)
2×2 Matrix{Float64}:
501.944  -316.258
-316.258   199.885

If we compare our solution against the analytical solution:

analytic_hessian(value.(x), 1.0, dual.([g_1, g_2]))
2×2 Matrix{Float64}:
501.944  -316.258
-316.258   199.885

If we look at the eigenvalues of the Hessian:

LinearAlgebra.eigvals(H_star)
2-element Vector{Float64}:
0.44439961542701667
701.3843409319256

we see that they are all positive. Therefore, the Hessian is positive definite, and so the solution found by Ipopt is a local minimizer.

Tip