Nonlinear

The Nonlinear submodule contains data structures and functions for working with a nonlinear optimization problem in the form of an expression graph. This page explains the API and describes the rationale behind its design.

Standard form

Nonlinear programs (NLPs) are a class of optimization problems in which some of the constraints or the objective function are nonlinear:

\[\begin{align} \min_{x \in \mathbb{R}^n} & f_0(x) \\ \;\;\text{s.t.} & l_j \le f_j(x) \le u_j & j = 1 \ldots m \end{align}\]

There may be additional constraints, as well as things like variable bounds and integrality restrictions, but we do not consider them here because they are best dealt with by other components of MathOptInterface.

API overview

The core element of the Nonlinear submodule is Nonlinear.Model:

julia> const Nonlinear = MOI.Nonlinear;

julia> model = Nonlinear.Model()
A Nonlinear.Model with:
 0 objectives
 0 parameters
 0 expressions
 0 constraints

Nonlinear.Model is a mutable struct that stores all of the nonlinear information added to the model.

Decision variables

Decision variables are represented by VariableIndexes. The user is responsible for creating these using MOI.VariableIndex(i), where i is the column associated with the variable.

Expressions

The input data structure is a Julia Expr. The input expressions can incorporate VariableIndexes, but these must be interpolated into the expression with $:

julia> x = MOI.VariableIndex(1)
MOI.VariableIndex(1)

julia> input = :(1 + sin($x)^2)
:(1 + sin(MathOptInterface.VariableIndex(1)) ^ 2)

There are a number of restrictions on the input Expr:

It cannot contain macros
It cannot contain broadcasting
It cannot contain splatting (except in limited situations)
It cannot contain linear algebra, such as matrix-vector products
It cannot contain generator expressions, including sum(i for i in S)

Given an input expression, add an expression using Nonlinear.add_expression:

julia> expr = Nonlinear.add_expression(model, input)
MathOptInterface.Nonlinear.ExpressionIndex(1)

The return value, expr, is a Nonlinear.ExpressionIndex that can then be interpolated into other input expressions.

Looking again at model, we see:

julia> model
A Nonlinear.Model with:
 0 objectives
 0 parameters
 1 expression
 0 constraints

Parameters

In addition to constant literals like 1 or 1.23, you can create parameters. Parameters are placeholders whose values can change before passing the expression to the solver. Create a parameter using Nonlinear.add_parameter, which accepts a default value:

julia> p = Nonlinear.add_parameter(model, 1.23)
MathOptInterface.Nonlinear.ParameterIndex(1)

The return value, p, is a Nonlinear.ParameterIndex that can then be interpolated into other input expressions.

Looking again at model, we see:

julia> model
A Nonlinear.Model with:
 0 objectives
 1 parameter
 1 expression
 0 constraints

Update a parameter as follows:

julia> model[p]
1.23

julia> model[p] = 4.56
4.56

julia> model[p]
4.56

Objectives

Set a nonlinear objective using Nonlinear.set_objective:

julia> Nonlinear.set_objective(model, :($p + $expr + $x))

julia> model
A Nonlinear.Model with:
 1 objective
 1 parameter
 1 expression
 0 constraints

Clear a nonlinear objective by passing nothing:

julia> Nonlinear.set_objective(model, nothing)

julia> model
A Nonlinear.Model with:
 0 objectives
 1 parameter
 1 expression
 0 constraints

But we'll re-add the objective for later:

julia> Nonlinear.set_objective(model, :($p + $expr + $x));

Constraints

Add a constraint using Nonlinear.add_constraint:

julia> c = Nonlinear.add_constraint(model, :(1 + sqrt($x)), MOI.LessThan(2.0))
MathOptInterface.Nonlinear.ConstraintIndex(1)

julia> model
A Nonlinear.Model with:
 1 objective
 1 parameter
 1 expression
 1 constraint

The return value, c, is a Nonlinear.ConstraintIndex that is a unique identifier for the constraint. Interval constraints are also supported:

julia> c2 = Nonlinear.add_constraint(model, :(1 + sqrt($x)), MOI.Interval(-1.0, 2.0))
MathOptInterface.Nonlinear.ConstraintIndex(2)

julia> model
A Nonlinear.Model with:
 1 objective
 1 parameter
 1 expression
 2 constraints

Delete a constraint using Nonlinear.delete:

julia> Nonlinear.delete(model, c2)

julia> model
A Nonlinear.Model with:
 1 objective
 1 parameter
 1 expression
 1 constraint

User-defined operators

By default, Nonlinear supports a wide range of univariate and multivariate operators. However, you can also define your own operators by registering them.

Univariate operators

julia> f(x) = 1 + sin(x)^2
f (generic function with 1 method)

julia> Nonlinear.register_operator(model, :my_f, 1, f)

Now, you can use :my_f in expressions:

julia> new_expr = Nonlinear.add_expression(model, :(my_f($x + 1)))
MathOptInterface.Nonlinear.ExpressionIndex(2)

By default, Nonlinear will compute first- and second-derivatives of the registered operator using ForwardDiff.jl. Override this by passing functions which compute the respective derivative:

julia> f′(x) = 2 * sin(x) * cos(x)
f′ (generic function with 1 method)

julia> Nonlinear.register_operator(model, :my_f2, 1, f, f′)

julia> f′′(x) = 2 * (cos(x)^2 - sin(x)^2)
f′′ (generic function with 1 method)

julia> Nonlinear.register_operator(model, :my_f3, 1, f, f′, f′′)

Multivariate operators

julia> g(x...) = x[1]^2 + x[1] * x[2] + x[2]^2
g (generic function with 1 method)

julia> Nonlinear.register_operator(model, :my_g, 2, g)

Now, you can use :my_g in expressions:

julia> new_expr = Nonlinear.add_expression(model, :(my_g($x + 1, $x)))
MathOptInterface.Nonlinear.ExpressionIndex(3)

By default, Nonlinear will compute the gradient of the registered operator using ForwardDiff.jl. (Hessian information is not supported.) Override this by passing a function to compute the gradient:

julia> function ∇g(ret, x...)
           ret[1] = 2 * x[1] + x[2]
           ret[2] = x[1] + 2 * x[2]
           return
       end
∇g (generic function with 1 method)

julia> Nonlinear.register_operator(model, :my_g2, 2, g, ∇g)

MathOptInterface

MathOptInterface communicates the nonlinear portion of an optimization problem to solvers using concrete subtypes of AbstractNLPEvaluator, which implement the Nonlinear programming API.

Create an AbstractNLPEvaluator from Nonlinear.Model using Nonlinear.Evaluator.

Nonlinear.Evaluator requires an Nonlinear.AbstractAutomaticDifferentiation backend and an ordered list of the variables that are included in the model.

There following backends are available to choose from within MOI, although other packages may add more options by sub-typing Nonlinear.AbstractAutomaticDifferentiation:

julia> evaluator = Nonlinear.Evaluator(model, Nonlinear.ExprGraphOnly(), [x])
Nonlinear.Evaluator with available features:
  * :ExprGraph

The functions of the Nonlinear programming API implemented by Nonlinear.Evaluator depends upon the chosen Nonlinear.AbstractAutomaticDifferentiation backend.

The :ExprGraph feature means we can call objective_expr and constraint_expr to retrieve the expression graph of the problem. However, we cannot call gradient terms such as eval_objective_gradient because Nonlinear.ExprGraphOnly does not have the capability to differentiate a nonlinear expression.

If, instead, we pass Nonlinear.SparseReverseMode, then we get access to :Grad, the gradient of the objective function, :Jac, the Jacobian matrix of the constraints, :JacVec, the ability to compute Jacobian-vector products, and :ExprGraph.

julia> evaluator = Nonlinear.Evaluator(
           model,
           Nonlinear.SparseReverseMode(),
           [x],
       )
Nonlinear.Evaluator with available features:
  * :Grad
  * :Jac
  * :JacVec
  * :ExprGraph

However, before using the evaluator, we need to call initialize:

julia> MOI.initialize(evaluator, [:Grad, :Jac, :JacVec, :ExprGraph])

Now we can call methods like eval_objective:

julia> x = [1.0]
1-element Vector{Float64}:
 1.0

julia> MOI.eval_objective(evaluator, x)
7.268073418273571

and eval_objective_gradient:

julia> grad = [0.0]
1-element Vector{Float64}:
 0.0

julia> MOI.eval_objective_gradient(evaluator, grad, x)

julia> grad
1-element Vector{Float64}:
 1.909297426825682

Instead of passing Nonlinear.Evaluator directly to solvers, solvers query the NLPBlock attribute, which returns an NLPBlockData. This object wraps an Nonlinear.Evaluator and includes other information such as constraint bounds and whether the evaluator has a nonlinear objective. Create and set NLPBlockData as follows:

julia> block = MOI.NLPBlockData(evaluator);

julia> model = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}());

julia> MOI.set(model, MOI.NLPBlock(), block);

Warning

Only call NLPBlockData once you have finished modifying the problem in model.

Putting everything together, you can create a nonlinear optimization problem in MathOptInterface as follows:

import MathOptInterface as MOI

function build_model(
    model::MOI.ModelLike;
    backend::MOI.Nonlinear.AbstractAutomaticDifferentiation,
)
    x = MOI.add_variable(model)
    y = MOI.add_variable(model)
    MOI.set(model, MOI.ObjectiveSense(), MOI.MIN_SENSE)
    nl_model = MOI.Nonlinear.Model()
    MOI.Nonlinear.set_objective(nl_model, :($x^2 + $y^2))
    evaluator = MOI.Nonlinear.Evaluator(nl_model, backend, [x, y])
    MOI.set(model, MOI.NLPBlock(), MOI.NLPBlockData(evaluator))
    return
end

# Replace `model` and `backend` with your optimizer and backend of choice.
model = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}())
build_model(model; backend = MOI.Nonlinear.SparseReverseMode())

Expression-graph representation

Nonlinear.Model stores nonlinear expressions in Nonlinear.Expressions. This section explains the design of the expression graph data structure in Nonlinear.Expression.

Given a nonlinear function like f(x) = sin(x)^2 + x, a conceptual aid for thinking about the graph representation of the expression is to convert it into Polish prefix notation:

f(x, y) = (+ (^ (sin x) 2) x)

This format identifies each operator (function), as well as a list of arguments. Operators can be univariate, like sin, or multivariate, like +.

A common way of representing Polish prefix notation in code is as follows:

julia> x = MOI.VariableIndex(1);

julia> struct ExprNode
           op::Symbol
           children::Vector{Union{ExprNode,Float64,MOI.VariableIndex}}
       end

julia> expr = ExprNode(:+, [ExprNode(:^, [ExprNode(:sin, [x]), 2.0]), x]);

This data structure follows our Polish prefix notation very closely, and we can easily identify the arguments to an operator. However, it has a significant draw-back: each node in the graph requires a Vector, which is heap-allocated and tracked by Julia's garbage collector (GC). For large models, we can expect to have millions of nodes in the expression graph, so this overhead quickly becomes prohibitive for computation.

An alternative is to record the expression as a linear tape:

julia> expr = Any[:+, 2, :^, 2, :sin, 1, x, 2.0, x]
9-element Vector{Any}:
  :+
 2
  :^
 2
  :sin
 1
  MOI.VariableIndex(1)
 2.0
  MOI.VariableIndex(1)

The Int after each operator Symbol specifies the number of arguments.

This data-structure is a single vector, which resolves our problem with the GC, but each element is the abstract type, Any, and so any operations on it will lead to slower dynamic dispatch. It's also hard to identify the children of each operation without reading the entire tape.

To summarize, representing expression graphs in Julia has the following challenges:

Nodes in the expression graph should not contain a heap-allocated object
All data-structures should be concretely typed
It should be easy to identify the children of a node

Sketch of the design in Nonlinear

Nonlinear overcomes these problems by decomposing the data structure into a number of different concrete-typed vectors.

First, we create vectors of the supported uni- and multivariate operators.

julia> const UNIVARIATE_OPERATORS = [:sin];

julia> const MULTIVARIATE_OPERATORS = [:+, :^];

In practice, there are many more supported operations than the ones listed here.

Second, we create an enum to represent the different types of nodes present in the expression graph:

julia> @enum(
           NodeType,
           NODE_CALL_MULTIVARIATE,
           NODE_CALL_UNIVARIATE,
           NODE_VARIABLE,
           NODE_VALUE,
       )

In practice, there are node types other than the ones listed here.

Third, we create two concretely typed structs as follows:

julia> struct Node
           type::NodeType
           parent::Int
           index::Int
       end

julia> struct Expression
           nodes::Vector{Node}
           values::Vector{Float64}
       end

For each node node in the .nodes field, if node.type is:

NODE_CALL_MULTIVARIATE, we look up MULTIVARIATE_OPERATORS[node.index] to retrieve the operator
NODE_CALL_UNIVARIATE, we look up UNIVARIATE_OPERATORS[node.index] to retrieve the operator
NODE_VARIABLE, we create MOI.VariableIndex(node.index)
NODE_VALUE, we look up values[node.index]

The .parent field of each node is the integer index of the parent node in .nodes. For the first node, the parent is -1 by convention.

Therefore, we can represent our function as:

julia> expr = Expression(
           [
               Node(NODE_CALL_MULTIVARIATE, -1, 1),
               Node(NODE_CALL_MULTIVARIATE, 1, 2),
               Node(NODE_CALL_UNIVARIATE, 2, 1),
               Node(NODE_VARIABLE, 3, 1),
               Node(NODE_VALUE, 2, 1),
               Node(NODE_VARIABLE, 1, 1),
           ],
           [2.0],
       );

The ordering of the nodes in the tape must satisfy two rules:

The children of a node must appear after the parent. This means that the tape is ordered topologically, so that a reverse pass of the nodes evaluates all children nodes before their parent
The arguments for a CALL node are ordered in the tape based on the order in which they appear in the function call.

Design goals

This is less readable than the other options, but does this data structure meet our design goals?

Instead of a heap-allocated object for each node, we only have two Vectors for each expression, nodes and values, as well as two constant vectors for the OPERATORS. In addition, all fields are concretely typed, and there are no Union or Any types.

For our third goal, it is not easy to identify the children of a node, but it is easy to identify the parent of any node. Therefore, we can use Nonlinear.adjacency_matrix to compute a sparse matrix that maps parents to their children.

The design in practice

In practice, Node and Expression are exactly Nonlinear.Node and Nonlinear.Expression. However, Nonlinear.NodeType has more fields to account for comparison operators such as :>= and :<=, logic operators such as :&& and :||, nonlinear parameters, and nested subexpressions.

Moreover, instead of storing the operators as global constants, they are stored in Nonlinear.OperatorRegistry, and it also stores a vector of logic operators and a vector of comparison operators. In addition to Nonlinear.DEFAULT_UNIVARIATE_OPERATORS and Nonlinear.DEFAULT_MULTIVARIATE_OPERATORS, you can register user-defined functions using Nonlinear.register_operator.

Nonlinear.Model is a struct that stores the Nonlinear.OperatorRegistry, as well as a list of parameters and subexpressions in the model.

ReverseAD

Nonlinear.ReverseAD is a submodule for computing derivatives of a nonlinear optimization problem using sparse reverse-mode automatic differentiation (AD).

This section does not attempt to explain how sparse reverse-mode AD works, but instead explains why MOI contains its own implementation, and highlights notable differences from similar packages.

Warning

Don't use the API in ReverseAD to compute derivatives. Instead, create a Nonlinear.Evaluator object with Nonlinear.SparseReverseMode as the backend, and then query the MOI API methods.

Design goals

The JuliaDiff organization maintains a list of packages for doing AD in Julia. At last count, there were at least ten packages——not including ReverseAD——for reverse-mode AD in Julia. ReverseAD exists because it has a different set of design goals.

Goal: handle scale and sparsity. The types of nonlinear optimization problems that MOI represents can be large scale (10^5 or more functions across 10^5 or more variables) with very sparse derivatives. The ability to compute a sparse Hessian matrix is essential. To the best of our knowledge, ReverseAD is the only reverse-mode AD system in Julia that handles sparsity by default.
Goal: limit the scope to improve robustness. Most other AD packages accept arbitrary Julia functions as input and then trace an expression graph using operator overloading. This means they must deal (or detect and ignore) with control flow, I/O, and other vagaries of Julia. In contrast, ReverseAD only accepts functions in the form of Nonlinear.Expression, which greatly limits the range of syntax that it must deal with. By reducing the scope of what we accept as input to functions relevant for mathematical optimization, we can provide a simpler implementation with various performance optimizations.
Goal: provide outputs which match what solvers expect. Other AD packages focus on differentiating individual Julia functions. In contrast, ReverseAD has a very specific use-case: to generate outputs needed by the MOI nonlinear API. This means it needs to efficiently compute sparse Hessians, and it needs subexpression handling to avoid recomputing subexpressions that are shared between functions.

History

ReverseAD started life as ReverseDiffSparse.jl, development of which began in early 2014(!). This was well before the other AD packages started development. Because we had a well-tested, working AD in JuMP, there was less motivation to contribute to and explore other AD packages. The lack of historical interaction also meant that other packages were not optimized for the types of problems that JuMP is built for (that is, large-scale sparse problems). When we first created MathOptInterface, we kept the AD in JuMP to simplify the transition, and post-poned the development of a first-class nonlinear interface in MathOptInterface.

Prior to the introduction of Nonlinear, JuMP's nonlinear implementation was a confusing mix of functions and types spread across the code base and in the private _Derivatives submodule. This made it hard to swap the AD system for another. The main motivation for refactoring JuMP to create the Nonlinear submodule in MathOptInterface was to abstract the interface between JuMP and the AD system, allowing us to swap-in and test new AD systems in the future.