# Computing Hessians

The purpose of this tutorial is to demonstrate how to compute the Hessian of the Lagrangian of a nonlinear program.

This is an advanced tutorial that interacts with the low-level nonlinear interface of MathOptInterface.

By default, JuMP exports the `MOI`

symbol as an alias for the MathOptInterface.jl package. We recommend making this more explicit in your code by adding the following lines:

```
import MathOptInterface
const MOI = MathOptInterface
```

Given a nonlinear program:

\[\begin{align} & \min_{x \in \mathbb{R}^n} & f(x) \\ & \;\;\text{s.t.} & l \le g_i(x) \le u \end{align}\]

the Hessian of the Lagrangian is computed as:

\[H(x, \sigma, \mu) = \sigma\nabla^2 f(x) + \sum_{i=1}^m \mu_i \nabla^2 g_i(x)\]

where $x$ is a primal point, $\sigma$ is a scalar (typically $1$), and $\mu$ is a vector of weights corresponding to the Lagrangian dual of the constraints.

This tutorial uses the following packages:

```
using JuMP
import Ipopt
import LinearAlgebra
import Random
import SparseArrays
```

## The basic model

To demonstrate how to interact with the lower-level nonlinear interface, we need an example model. The exact model isn't important; we use the model from The Rosenbrock function tutorial, with some additional constraints to demonstrate various features of the lower-level interface.

```
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x[i = 1:2], start = -i)
@constraint(model, g_1, x[1]^2 <= 1)
@NLconstraint(model, g_2, (x[1] + x[2])^2 <= 2)
@NLobjective(model, Min, (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2)
optimize!(model)
```

## The analytic solution

With a little work, it is possible to analytically derive the correct hessian:

```
function analytic_hessian(x, σ, μ)
g_1_H = [2.0 0.0; 0.0 0.0]
g_2_H = [2.0 2.0; 2.0 2.0]
f_H = zeros(2, 2)
f_H[1, 1] = 2.0 + 1200.0 * x[1]^2 - 400.0 * x[2]
f_H[1, 2] = f_H[2, 1] = -400.0 * x[1]
f_H[2, 2] = 200.0
return σ * f_H + μ' * [g_1_H, g_2_H]
end
```

`analytic_hessian (generic function with 1 method)`

Here are various points:

`analytic_hessian([1, 1], 0, [0, 0])`

```
2×2 Matrix{Float64}:
0.0 0.0
0.0 0.0
```

`analytic_hessian([1, 1], 0, [1, 0])`

```
2×2 Matrix{Float64}:
2.0 0.0
0.0 0.0
```

`analytic_hessian([1, 1], 0, [0, 1])`

```
2×2 Matrix{Float64}:
2.0 2.0
2.0 2.0
```

`analytic_hessian([1, 1], 1, [0, 0])`

```
2×2 Matrix{Float64}:
802.0 -400.0
-400.0 200.0
```

## Initializing the NLPEvaluator

JuMP stores all information relating to the nonlinear portions of the model in a `NLPEvaluator`

struct:

`d = NLPEvaluator(model)`

```
Nonlinear.Evaluator with available features:
* :Grad
* :Jac
* :JacVec
* :Hess
* :HessVec
* :ExprGraph
```

Before computing anything with the NLPEvaluator, we need to initialize it. Use `MOI.features_available`

to see what we can query:

`MOI.features_available(d)`

```
6-element Vector{Symbol}:
:Grad
:Jac
:JacVec
:Hess
:HessVec
:ExprGraph
```

Consult the MOI documentation for specifics. But to obtain the Hessian matrix, we need to initialize `:Hess`

:

`MOI.initialize(d, [:Hess])`

MOI represents the Hessian as a sparse matrix. Get the sparsity pattern as follows:

`hessian_sparsity = MOI.hessian_lagrangian_structure(d)`

```
6-element Vector{Tuple{Int64, Int64}}:
(1, 1)
(2, 2)
(2, 1)
(1, 1)
(2, 2)
(2, 1)
```

The sparsity pattern has a few properties of interest:

- Each element
`(i, j)`

indicates a structural non-zero in row`i`

and column`j`

- The list may contain duplicates, in which case we should add the values together
- The list does not need to be sorted
- The list may contain any mix of lower- or upper-triangular indices

This format matches Julia's sparse-triplet form of a SparseArray, so we can convert from the sparse Hessian representation to a Julia SparseArray as follows:

```
I = [i for (i, _) in hessian_sparsity]
J = [j for (_, j) in hessian_sparsity]
V = zeros(length(hessian_sparsity))
n = num_variables(model)
H = SparseArrays.sparse(I, J, V, n, n)
```

```
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
0.0 ⋅
0.0 0.0
```

Of course, knowing where the zeros are isn't very interesting. We really want to compute the value of the Hessian matrix at a point.

```
num_g = num_nonlinear_constraints(model)
MOI.eval_hessian_lagrangian(d, V, ones(n), 1.0, ones(num_g))
H = SparseArrays.sparse(I, J, V, n, n)
```

```
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
804.0 ⋅
-398.0 202.0
```

In practice, we often want to compute the value of the hessian at the optimal solution.

First, we compute the primal solution. To do so, we need a vector of the variables in the order that they were passed to the solver:

`x = all_variables(model)`

```
2-element Vector{VariableRef}:
x[1]
x[2]
```

Here `x[1]`

is the variable that corresponds to column 1, and so on. Here's the optimal primal solution:

`x_optimal = value.(x)`

```
2-element Vector{Float64}:
0.7903587551555908
0.6238546250475736
```

Next, we need the optimal dual solution associated with the nonlinear constraints:

`y_optimal = dual.(all_nonlinear_constraints(model))`

```
1-element Vector{Float64}:
-0.057440893636909414
```

Now we can compute the Hessian at the optimal primal-dual point:

```
MOI.eval_hessian_lagrangian(d, V, x_optimal, 1.0, y_optimal)
H = SparseArrays.sparse(I, J, V, n, n)
```

```
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
501.944 ⋅
-316.258 199.885
```

However, this Hessian isn't quite right because it isn't symmetric. We can fix this by filling in the appropriate off-diagonal terms:

```
function fill_off_diagonal(H)
ret = H + H'
row_vals = SparseArrays.rowvals(ret)
non_zeros = SparseArrays.nonzeros(ret)
for col in 1:size(ret, 2)
for i in SparseArrays.nzrange(ret, col)
if col == row_vals[i]
non_zeros[i] /= 2
end
end
end
return ret
end
fill_off_diagonal(H)
```

```
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 4 stored entries:
501.944 -316.258
-316.258 199.885
```

Moreover, this Hessian only accounts for the objective and constraints entered using `@NLobjective`

and `@NLconstraint`

. If we want to take quadratic objectives and constraints written using `@objective`

or `@constraint`

into account, we'll need to handle them separately.

If you don't want to do this, you can replace calls to `@objective`

and `@constraint`

with `@NLobjective`

and `@NLconstraint`

.

## Hessians from QuadExpr functions

To compute the hessian from a quadratic expression, let's see how JuMP represents a quadratic constraint:

`f = constraint_object(g_1).func`

\[ x_{1}^2 \]

`f`

is a quadratic expression of the form:

`f(x) = Σqᵢⱼ * xᵢ * xⱼ + Σaᵢ xᵢ + c`

So `∇²f(x)`

is the matrix formed by `[qᵢⱼ]ᵢⱼ`

if `i != j`

and `2[qᵢⱼ]ᵢⱼ`

if `i = j`

.

```
variables_to_column = Dict(x[i] => i for i in 1:n)
function add_to_hessian(H, f::QuadExpr, μ)
for (vars, coef) in f.terms
i = variables_to_column[vars.a]
j = variables_to_column[vars.b]
H[i, j] += μ * coef
end
return
end
```

`add_to_hessian (generic function with 1 method)`

If the function `f`

is not a `QuadExpr`

, do nothing because it is an `AffExpr`

or a `VariableRef`

. In both cases, the second derivative is zero.

`add_to_hessian(H, f::Any, μ) = nothing`

`add_to_hessian (generic function with 2 methods)`

Then we iterate over all constraints in the model and add their Hessian components:

```
for (F, S) in list_of_constraint_types(model)
for cref in all_constraints(model, F, S)
f = constraint_object(cref).func
add_to_hessian(H, f, dual(cref))
end
end
H
```

```
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
501.944 ⋅
-316.258 199.885
```

Finally, we need to take into account the objective function:

```
add_to_hessian(H, objective_function(model), 1.0)
fill_off_diagonal(H)
```

```
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 4 stored entries:
501.944 -316.258
-316.258 199.885
```

Putting everything together:

```
function compute_optimal_hessian(model)
d = NLPEvaluator(model)
MOI.initialize(d, [:Hess])
hessian_sparsity = MOI.hessian_lagrangian_structure(d)
I = [i for (i, _) in hessian_sparsity]
J = [j for (_, j) in hessian_sparsity]
V = zeros(length(hessian_sparsity))
x = all_variables(model)
x_optimal = value.(x)
y_optimal = dual.(all_nonlinear_constraints(model))
MOI.eval_hessian_lagrangian(d, V, x_optimal, 1.0, y_optimal)
n = num_variables(model)
H = SparseArrays.sparse(I, J, V, n, n)
vmap = Dict(x[i] => i for i in 1:n)
add_to_hessian(H, f::Any, μ) = nothing
function add_to_hessian(H, f::QuadExpr, μ)
for (vars, coef) in f.terms
if vars.a != vars.b
H[vmap[vars.a], vmap[vars.b]] += μ * coef
else
H[vmap[vars.a], vmap[vars.b]] += 2 * μ * coef
end
end
end
for (F, S) in list_of_constraint_types(model)
for cref in all_constraints(model, F, S)
add_to_hessian(H, constraint_object(cref).func, dual(cref))
end
end
add_to_hessian(H, objective_function(model), 1.0)
return Matrix(fill_off_diagonal(H))
end
H_star = compute_optimal_hessian(model)
```

```
2×2 Matrix{Float64}:
501.944 -316.258
-316.258 199.885
```

If we compare our solution against the analytical solution:

`analytic_hessian(value.(x), 1.0, dual.([g_1, g_2]))`

```
2×2 Matrix{Float64}:
501.944 -316.258
-316.258 199.885
```

If we look at the eigenvalues of the Hessian:

`LinearAlgebra.eigvals(H_star)`

```
2-element Vector{Float64}:
0.44439959255495864
701.3843408744132
```

we see that they are all positive. Therefore, the Hessian is positive definite, and so the solution found by Ipopt is a local minimizer.

This tutorial was generated using Literate.jl. View the source `.jl`

file on GitHub.