Computing Hessians

The purpose of this tutorial is to demonstrate how to compute the Hessian of the Lagrangian of a nonlinear program.

Warning

This is an advanced tutorial that interacts with the low-level nonlinear interface of MathOptInterface.

By default, JuMP exports the MOI symbol as an alias for the MathOptInterface.jl package. We recommend making this more explicit in your code by adding the following lines:

import MathOptInterface
const MOI = MathOptInterface

Given a nonlinear program:

\[\begin{align} & \min_{x \in \mathbb{R}^n} & f(x) \\ & \;\;\text{s.t.} & l \le g_i(x) \le u \end{align}\]

the Hessian of the Lagrangian is computed as:

\[H(x, \sigma, \mu) = \sigma\nabla^2 f(x) + \sum_{i=1}^m \mu_i \nabla^2 g_i(x)\]

where $x$ is a primal point, $\sigma$ is a scalar (typically $1$), and $\mu$ is a vector of weights corresponding to the Lagrangian dual of the constraints.

This tutorial uses the following packages:

using JuMP
import Ipopt
import LinearAlgebra
import Random
import SparseArrays

The basic model

To demonstrate how to interact with the lower-level nonlinear interface, we need an example model. The exact model isn't important; we use the model from The Rosenbrock function tutorial, with some additional constraints to demonstrate various features of the lower-level interface.

model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x[i = 1:2], start = -i)
@constraint(model, g_1, x[1]^2 <= 1)
@NLconstraint(model, g_2, (x[1] + x[2])^2 <= 2)
@NLobjective(model, Min, (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2)
optimize!(model)

The analytic solution

With a little work, it is possible to analytically derive the correct hessian:

function analytic_hessian(x, σ, μ)
    g_1_H = [2.0 0.0; 0.0 0.0]
    g_2_H = [2.0 2.0; 2.0 2.0]
    f_H = zeros(2, 2)
    f_H[1, 1] = 2.0 + 1200.0 * x[1]^2 - 400.0 * x[2]
    f_H[1, 2] = f_H[2, 1] = -400.0 * x[1]
    f_H[2, 2] = 200.0
    return σ * f_H + μ' * [g_1_H, g_2_H]
end
analytic_hessian (generic function with 1 method)

Here are various points:

analytic_hessian([1, 1], 0, [0, 0])
2×2 Matrix{Float64}:
 0.0  0.0
 0.0  0.0
analytic_hessian([1, 1], 0, [1, 0])
2×2 Matrix{Float64}:
 2.0  0.0
 0.0  0.0
analytic_hessian([1, 1], 0, [0, 1])
2×2 Matrix{Float64}:
 2.0  2.0
 2.0  2.0
analytic_hessian([1, 1], 1, [0, 0])
2×2 Matrix{Float64}:
  802.0  -400.0
 -400.0   200.0

Initializing the NLPEvaluator

JuMP stores all information relating to the nonlinear portions of the model in a NLPEvaluator struct:

d = NLPEvaluator(model)
Nonlinear.Evaluator with available features:
  * :Grad
  * :Jac
  * :JacVec
  * :Hess
  * :HessVec
  * :ExprGraph

Before computing anything with the NLPEvaluator, we need to initialize it. Use MOI.features_available to see what we can query:

MOI.features_available(d)
6-element Vector{Symbol}:
 :Grad
 :Jac
 :JacVec
 :Hess
 :HessVec
 :ExprGraph

Consult the MOI documentation for specifics. But to obtain the Hessian matrix, we need to initialize :Hess:

MOI.initialize(d, [:Hess])

MOI represents the Hessian as a sparse matrix. Get the sparsity pattern as follows:

hessian_sparsity = MOI.hessian_lagrangian_structure(d)
6-element Vector{Tuple{Int64, Int64}}:
 (1, 1)
 (2, 2)
 (2, 1)
 (1, 1)
 (2, 2)
 (2, 1)

The sparsity pattern has a few properties of interest:

  • Each element (i, j) indicates a structural non-zero in row i and column j
  • The list may contain duplicates, in which case we should add the values together
  • The list does not need to be sorted
  • The list may contain any mix of lower- or upper-triangular indices

This format matches Julia's sparse-triplet form of a SparseArray, so we can convert from the sparse Hessian representation to a Julia SparseArray as follows:

I = [i for (i, _) in hessian_sparsity]
J = [j for (_, j) in hessian_sparsity]
V = zeros(length(hessian_sparsity))
n = num_variables(model)
H = SparseArrays.sparse(I, J, V, n, n)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
 0.0   ⋅ 
 0.0  0.0

Of course, knowing where the zeros are isn't very interesting. We really want to compute the value of the Hessian matrix at a point.

num_g = num_nonlinear_constraints(model)
MOI.eval_hessian_lagrangian(d, V, ones(n), 1.0, ones(num_g))
H = SparseArrays.sparse(I, J, V, n, n)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
  804.0     ⋅ 
 -398.0  202.0

In practice, we often want to compute the value of the hessian at the optimal solution.

First, we compute the primal solution. To do so, we need a vector of the variables in the order that they were passed to the solver:

x = all_variables(model)
2-element Vector{VariableRef}:
 x[1]
 x[2]

Here x[1] is the variable that corresponds to column 1, and so on. Here's the optimal primal solution:

x_optimal = value.(x)
2-element Vector{Float64}:
 0.7903587551555908
 0.6238546250475736

Next, we need the optimal dual solution associated with the nonlinear constraints:

y_optimal = dual.(all_nonlinear_constraints(model))
1-element Vector{Float64}:
 -0.057440893636909414

Now we can compute the Hessian at the optimal primal-dual point:

MOI.eval_hessian_lagrangian(d, V, x_optimal, 1.0, y_optimal)
H = SparseArrays.sparse(I, J, V, n, n)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
  501.944     ⋅ 
 -316.258  199.885

However, this Hessian isn't quite right because it isn't symmetric. We can fix this by filling in the appropriate off-diagonal terms:

function fill_off_diagonal(H)
    ret = H + H'
    row_vals = SparseArrays.rowvals(ret)
    non_zeros = SparseArrays.nonzeros(ret)
    for col in 1:size(ret, 2)
        for i in SparseArrays.nzrange(ret, col)
            if col == row_vals[i]
                non_zeros[i] /= 2
            end
        end
    end
    return ret
end

fill_off_diagonal(H)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 4 stored entries:
  501.944  -316.258
 -316.258   199.885

Moreover, this Hessian only accounts for the objective and constraints entered using @NLobjective and @NLconstraint. If we want to take quadratic objectives and constraints written using @objective or @constraint into account, we'll need to handle them separately.

Tip

If you don't want to do this, you can replace calls to @objective and @constraint with @NLobjective and @NLconstraint.

Hessians from QuadExpr functions

To compute the hessian from a quadratic expression, let's see how JuMP represents a quadratic constraint:

f = constraint_object(g_1).func

\[ x_{1}^2 \]

f is a quadratic expression of the form:

f(x) = Σqᵢⱼ * xᵢ * xⱼ + Σaᵢ xᵢ + c

So ∇²f(x) is the matrix formed by [qᵢⱼ]ᵢⱼ if i != j and 2[qᵢⱼ]ᵢⱼ if i = j.

variables_to_column = Dict(x[i] => i for i in 1:n)

function add_to_hessian(H, f::QuadExpr, μ)
    for (vars, coef) in f.terms
        i = variables_to_column[vars.a]
        j = variables_to_column[vars.b]
        H[i, j] += μ * coef
    end
    return
end
add_to_hessian (generic function with 1 method)

If the function f is not a QuadExpr, do nothing because it is an AffExpr or a VariableRef. In both cases, the second derivative is zero.

add_to_hessian(H, f::Any, μ) = nothing
add_to_hessian (generic function with 2 methods)

Then we iterate over all constraints in the model and add their Hessian components:

for (F, S) in list_of_constraint_types(model)
    for cref in all_constraints(model, F, S)
        f = constraint_object(cref).func
        add_to_hessian(H, f, dual(cref))
    end
end

H
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
  501.944     ⋅ 
 -316.258  199.885

Finally, we need to take into account the objective function:

add_to_hessian(H, objective_function(model), 1.0)

fill_off_diagonal(H)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 4 stored entries:
  501.944  -316.258
 -316.258   199.885

Putting everything together:

function compute_optimal_hessian(model)
    d = NLPEvaluator(model)
    MOI.initialize(d, [:Hess])
    hessian_sparsity = MOI.hessian_lagrangian_structure(d)
    I = [i for (i, _) in hessian_sparsity]
    J = [j for (_, j) in hessian_sparsity]
    V = zeros(length(hessian_sparsity))
    x = all_variables(model)
    x_optimal = value.(x)
    y_optimal = dual.(all_nonlinear_constraints(model))
    MOI.eval_hessian_lagrangian(d, V, x_optimal, 1.0, y_optimal)
    n = num_variables(model)
    H = SparseArrays.sparse(I, J, V, n, n)
    vmap = Dict(x[i] => i for i in 1:n)
    add_to_hessian(H, f::Any, μ) = nothing
    function add_to_hessian(H, f::QuadExpr, μ)
        for (vars, coef) in f.terms
            if vars.a != vars.b
                H[vmap[vars.a], vmap[vars.b]] += μ * coef
            else
                H[vmap[vars.a], vmap[vars.b]] += 2 * μ * coef
            end
        end
    end
    for (F, S) in list_of_constraint_types(model)
        for cref in all_constraints(model, F, S)
            add_to_hessian(H, constraint_object(cref).func, dual(cref))
        end
    end
    add_to_hessian(H, objective_function(model), 1.0)
    return Matrix(fill_off_diagonal(H))
end

H_star = compute_optimal_hessian(model)
2×2 Matrix{Float64}:
  501.944  -316.258
 -316.258   199.885

If we compare our solution against the analytical solution:

analytic_hessian(value.(x), 1.0, dual.([g_1, g_2]))
2×2 Matrix{Float64}:
  501.944  -316.258
 -316.258   199.885

If we look at the eigenvalues of the Hessian:

LinearAlgebra.eigvals(H_star)
2-element Vector{Float64}:
   0.44439959255495864
 701.3843408744132

we see that they are all positive. Therefore, the Hessian is positive definite, and so the solution found by Ipopt is a local minimizer.


Tip

This tutorial was generated using Literate.jl. View the source .jl file on GitHub.