Nonlinear
The Nonlinear submodule is experimental. Until this message is removed, breaking changes may be introduced in any minor or patch release of MathOptInterface.
The Nonlinear
submodule contains data structures and functions for working with a nonlinear optimization problem in the form of an expression graph. This page explains the API and describes the rationale behind its design.
Standard form
Nonlinear programs (NLPs) are a class of optimization problems in which some of the constraints or the objective function are nonlinear:
\[\begin{align} \min_{x \in \mathbb{R}^n} & f_0(x) \\ \;\;\text{s.t.} & l_j \le f_j(x) \le u_j & j = 1 \ldots m \end{align}\]
There may be additional constraints, as well as things like variable bounds and integrality restrictions, but we do not consider them here because they are best dealt with by other components of MathOptInterface.
API overview
The core element of the Nonlinear
submodule is Nonlinear.Model
:
julia> const Nonlinear = MOI.Nonlinear;
julia> model = Nonlinear.Model()
A Nonlinear.Model with:
0 objectives
0 parameters
0 expressions
0 constraints
Nonlinear.Model
is a mutable struct that stores all of the nonlinear information added to the model.
Decision variables
Decision variables are represented by VariableIndex
es. The user is responsible for creating these using MOI.VariableIndex(i)
, where i
is the column associated with the variable.
Expressions
The input data structure is a Julia Expr
. The input expressions can incorporate VariableIndex
es, but these must be interpolated into the expression with $
:
julia> x = MOI.VariableIndex(1)
MOI.VariableIndex(1)
julia> input = :(1 + sin($x)^2)
:(1 + sin(MathOptInterface.VariableIndex(1)) ^ 2)
There are a number of restrictions on the input Expr
:
- It cannot contain macros
- It cannot contain broadcasting
- It cannot contain splatting (except in limited situations)
- It cannot contain linear algebra, such as matrix-vector products
- It cannot contain generator expressions, including
sum(i for i in S)
Given an input expression, add an expression using Nonlinear.add_expression
:
julia> expr = Nonlinear.add_expression(model, input)
MathOptInterface.Nonlinear.ExpressionIndex(1)
The return value, expr
, is a Nonlinear.ExpressionIndex
that can then be interpolated into other input expressions.
Looking again at model
, we see:
julia> model
A Nonlinear.Model with:
0 objectives
0 parameters
1 expression
0 constraints
Parameters
In addition to constant literals like 1
or 1.23
, you can create parameters. Parameters are placeholders whose values can change before passing the expression to the solver. Create a parameter using Nonlinear.add_parameter
, which accepts a default value:
julia> p = Nonlinear.add_parameter(model, 1.23)
MathOptInterface.Nonlinear.ParameterIndex(1)
The return value, p
, is a Nonlinear.ParameterIndex
that can then be interpolated into other input expressions.
Looking again at model
, we see:
julia> model
A Nonlinear.Model with:
0 objectives
1 parameter
1 expression
0 constraints
Update a parameter as follows:
julia> model[p]
1.23
julia> model[p] = 4.56
4.56
julia> model[p]
4.56
Objectives
Set a nonlinear objective using Nonlinear.set_objective
:
julia> Nonlinear.set_objective(model, :($p + $expr + $x))
julia> model
A Nonlinear.Model with:
1 objective
1 parameter
1 expression
0 constraints
Clear a nonlinear objective by passing nothing
:
julia> Nonlinear.set_objective(model, nothing)
julia> model
A Nonlinear.Model with:
0 objectives
1 parameter
1 expression
0 constraints
But we'll re-add the objective for later:
julia> Nonlinear.set_objective(model, :($p + $expr + $x));
Constraints
Add a constraint using Nonlinear.add_constraint
:
julia> c = Nonlinear.add_constraint(model, :(1 + sqrt($x)), MOI.LessThan(2.0))
MathOptInterface.Nonlinear.ConstraintIndex(1)
julia> model
A Nonlinear.Model with:
1 objective
1 parameter
1 expression
1 constraint
The return value, c
, is a Nonlinear.ConstraintIndex
that is a unique identifier for the constraint. Interval constraints are also supported:
julia> c2 = Nonlinear.add_constraint(model, :(1 + sqrt($x)), MOI.Interval(-1.0, 2.0))
MathOptInterface.Nonlinear.ConstraintIndex(2)
julia> model
A Nonlinear.Model with:
1 objective
1 parameter
1 expression
2 constraints
Delete a constraint using Nonlinear.delete
:
julia> Nonlinear.delete(model, c2)
julia> model
A Nonlinear.Model with:
1 objective
1 parameter
1 expression
1 constraint
User-defined operators
By default, Nonlinear
supports a wide range of univariate and multivariate operators. However, you can also define your own operators by registering them.
Univariate operators
Register a univariate user-defined operator using Nonlinear.register_operator
:
julia> f(x) = 1 + sin(x)^2
f (generic function with 1 method)
julia> Nonlinear.register_operator(model, :my_f, 1, f)
Now, you can use :my_f
in expressions:
julia> new_expr = Nonlinear.add_expression(model, :(my_f($x + 1)))
MathOptInterface.Nonlinear.ExpressionIndex(2)
By default, Nonlinear
will compute first- and second-derivatives of the registered operator using ForwardDiff.jl. Override this by passing functions which compute the respective derivative:
julia> f′(x) = 2 * sin(x) * cos(x)
f′ (generic function with 1 method)
julia> Nonlinear.register_operator(model, :my_f2, 1, f, f′)
or
julia> f′′(x) = 2 * (cos(x)^2 - sin(x)^2)
f′′ (generic function with 1 method)
julia> Nonlinear.register_operator(model, :my_f3, 1, f, f′, f′′)
Multivariate operators
Register a multivariate user-defined operator using Nonlinear.register_operator
:
julia> g(x...) = x[1]^2 + x[1] * x[2] + x[2]^2
g (generic function with 1 method)
julia> Nonlinear.register_operator(model, :my_g, 2, g)
Now, you can use :my_g
in expressions:
julia> new_expr = Nonlinear.add_expression(model, :(my_g($x + 1, $x)))
MathOptInterface.Nonlinear.ExpressionIndex(3)
By default, Nonlinear
will compute the gradient of the registered operator using ForwardDiff.jl. (Hessian information is not supported.) Override this by passing a function to compute the gradient:
julia> function ∇g(ret, x...)
ret[1] = 2 * x[1] + x[2]
ret[2] = x[1] + 2 * x[2]
return
end
∇g (generic function with 1 method)
julia> Nonlinear.register_operator(model, :my_g2, 2, g, ∇g)
MathOptInterface
MathOptInterface communicates the nonlinear portion of an optimization problem to solvers using concrete subtypes of AbstractNLPEvaluator
, which implement the Nonlinear programming API.
Create an AbstractNLPEvaluator
from Nonlinear.Model
using Nonlinear.Evaluator
.
Nonlinear.Evaluator
requires an Nonlinear.AbstractAutomaticDifferentiation
backend and an ordered list of the variables that are included in the model.
There following backends are available to choose from within MOI, although other packages may add more options by sub-typing Nonlinear.AbstractAutomaticDifferentiation
:
julia> evaluator = Nonlinear.Evaluator(model, Nonlinear.ExprGraphOnly(), [x])
Nonlinear.Evaluator with available features:
* :ExprGraph
The functions of the Nonlinear programming API implemented by Nonlinear.Evaluator
depends upon the chosen Nonlinear.AbstractAutomaticDifferentiation
backend.
The :ExprGraph
feature means we can call objective_expr
and constraint_expr
to retrieve the expression graph of the problem. However, we cannot call gradient terms such as eval_objective_gradient
because Nonlinear.ExprGraphOnly
does not have the capability to differentiate a nonlinear expression.
If, instead, we pass Nonlinear.SparseReverseMode
, then we get access to :Grad
, the gradient of the objective function, :Jac
, the Jacobian matrix of the constraints, :JacVec
, the ability to compute Jacobian-vector products, and :ExprGraph
.
julia> evaluator = Nonlinear.Evaluator(
model,
Nonlinear.SparseReverseMode(),
[x],
)
Nonlinear.Evaluator with available features:
* :Grad
* :Jac
* :JacVec
* :ExprGraph
However, before using the evaluator, we need to call initialize
:
julia> MOI.initialize(evaluator, [:Grad, :Jac, :JacVec, :ExprGraph])
Now we can call methods like eval_objective
:
julia> x = [1.0]
1-element Vector{Float64}:
1.0
julia> MOI.eval_objective(evaluator, x)
7.268073418273571
julia> grad = [0.0]
1-element Vector{Float64}:
0.0
julia> MOI.eval_objective_gradient(evaluator, grad, x)
julia> grad
1-element Vector{Float64}:
1.909297426825682
Instead of passing Nonlinear.Evaluator
directly to solvers, solvers query the NLPBlock
attribute, which returns an NLPBlockData
. This object wraps an Nonlinear.Evaluator
and includes other information such as constraint bounds and whether the evaluator has a nonlinear objective. Create and set NLPBlockData
as follows:
julia> block = MOI.NLPBlockData(evaluator);
julia> model = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}());
julia> MOI.set(model, MOI.NLPBlock(), block);
Only call NLPBlockData
once you have finished modifying the problem in model
.
Putting everything together, you can create a nonlinear optimization problem in MathOptInterface as follows:
import MathOptInterface as MOI
function build_model(
model::MOI.ModelLike;
backend::MOI.Nonlinear.AbstractAutomaticDifferentiation,
)
x = MOI.add_variable(model)
y = MOI.add_variable(model)
MOI.set(model, MOI.ObjectiveSense(), MOI.MIN_SENSE)
nl_model = MOI.Nonlinear.Model()
MOI.Nonlinear.set_objective(nl_model, :($x^2 + $y^2))
evaluator = MOI.Nonlinear.Evaluator(nl_model, backend, [x, y])
MOI.set(model, MOI.NLPBlock(), MOI.NLPBlockData(evaluator))
return
end
# Replace `model` and `backend` with your optimizer and backend of choice.
model = MOI.Utilities.UniversalFallback(MOI.Utilities.Model{Float64}())
build_model(model; backend = MOI.Nonlinear.SparseReverseMode())
Expression-graph representation
Nonlinear.Model
stores nonlinear expressions in Nonlinear.Expression
s. This section explains the design of the expression graph data structure in Nonlinear.Expression
.
Given a nonlinear function like f(x) = sin(x)^2 + x
, a conceptual aid for thinking about the graph representation of the expression is to convert it into Polish prefix notation:
f(x, y) = (+ (^ (sin x) 2) x)
This format identifies each operator (function), as well as a list of arguments. Operators can be univariate, like sin
, or multivariate, like +
.
A common way of representing Polish prefix notation in code is as follows:
julia> x = MOI.VariableIndex(1);
julia> struct ExprNode
op::Symbol
children::Vector{Union{ExprNode,Float64,MOI.VariableIndex}}
end
julia> expr = ExprNode(:+, [ExprNode(:^, [ExprNode(:sin, [x]), 2.0]), x]);
This data structure follows our Polish prefix notation very closely, and we can easily identify the arguments to an operator. However, it has a significant draw-back: each node in the graph requires a Vector
, which is heap-allocated and tracked by Julia's garbage collector (GC). For large models, we can expect to have millions of nodes in the expression graph, so this overhead quickly becomes prohibitive for computation.
An alternative is to record the expression as a linear tape:
julia> expr = Any[:+, 2, :^, 2, :sin, 1, x, 2.0, x]
9-element Vector{Any}:
:+
2
:^
2
:sin
1
MOI.VariableIndex(1)
2.0
MOI.VariableIndex(1)
The Int
after each operator Symbol
specifies the number of arguments.
This data-structure is a single vector, which resolves our problem with the GC, but each element is the abstract type, Any
, and so any operations on it will lead to slower dynamic dispatch. It's also hard to identify the children of each operation without reading the entire tape.
To summarize, representing expression graphs in Julia has the following challenges:
- Nodes in the expression graph should not contain a heap-allocated object
- All data-structures should be concretely typed
- It should be easy to identify the children of a node
Sketch of the design in Nonlinear
Nonlinear
overcomes these problems by decomposing the data structure into a number of different concrete-typed vectors.
First, we create vectors of the supported uni- and multivariate operators.
julia> const UNIVARIATE_OPERATORS = [:sin];
julia> const MULTIVARIATE_OPERATORS = [:+, :^];
In practice, there are many more supported operations than the ones listed here.
Second, we create an enum to represent the different types of nodes present in the expression graph:
julia> @enum(
NodeType,
NODE_CALL_MULTIVARIATE,
NODE_CALL_UNIVARIATE,
NODE_VARIABLE,
NODE_VALUE,
)
In practice, there are node types other than the ones listed here.
Third, we create two concretely typed structs as follows:
julia> struct Node
type::NodeType
parent::Int
index::Int
end
julia> struct Expression
nodes::Vector{Node}
values::Vector{Float64}
end
For each node node
in the .nodes
field, if node.type
is:
NODE_CALL_MULTIVARIATE
, we look upMULTIVARIATE_OPERATORS[node.index]
to retrieve the operatorNODE_CALL_UNIVARIATE
, we look upUNIVARIATE_OPERATORS[node.index]
to retrieve the operatorNODE_VARIABLE
, we createMOI.VariableIndex(node.index)
NODE_VALUE
, we look upvalues[node.index]
The .parent
field of each node is the integer index of the parent node in .nodes
. For the first node, the parent is -1
by convention.
Therefore, we can represent our function as:
julia> expr = Expression(
[
Node(NODE_CALL_MULTIVARIATE, -1, 1),
Node(NODE_CALL_MULTIVARIATE, 1, 2),
Node(NODE_CALL_UNIVARIATE, 2, 1),
Node(NODE_VARIABLE, 3, 1),
Node(NODE_VALUE, 2, 1),
Node(NODE_VARIABLE, 1, 1),
],
[2.0],
);
The ordering of the nodes in the tape must satisfy two rules:
- The children of a node must appear after the parent. This means that the tape is ordered topologically, so that a reverse pass of the nodes evaluates all children nodes before their parent
- The arguments for a
CALL
node are ordered in the tape based on the order in which they appear in the function call.
Design goals
This is less readable than the other options, but does this data structure meet our design goals?
Instead of a heap-allocated object for each node, we only have two Vector
s for each expression, nodes
and values
, as well as two constant vectors for the OPERATORS
. In addition, all fields are concretely typed, and there are no Union
or Any
types.
For our third goal, it is not easy to identify the children of a node, but it is easy to identify the parent of any node. Therefore, we can use Nonlinear.adjacency_matrix
to compute a sparse matrix that maps parents to their children.
The design in practice
In practice, Node
and Expression
are exactly Nonlinear.Node
and Nonlinear.Expression
. However, Nonlinear.NodeType
has more fields to account for comparison operators such as :>=
and :<=
, logic operators such as :&&
and :||
, nonlinear parameters, and nested subexpressions.
Moreover, instead of storing the operators as global constants, they are stored in Nonlinear.OperatorRegistry
, and it also stores a vector of logic operators and a vector of comparison operators. In addition to Nonlinear.DEFAULT_UNIVARIATE_OPERATORS
and Nonlinear.DEFAULT_MULTIVARIATE_OPERATORS
, you can register user-defined functions using Nonlinear.register_operator
.
Nonlinear.Model
is a struct that stores the Nonlinear.OperatorRegistry
, as well as a list of parameters and subexpressions in the model.
ReverseAD
Nonlinear.ReverseAD
is a submodule for computing derivatives of a nonlinear optimization problem using sparse reverse-mode automatic differentiation (AD).
This section does not attempt to explain how sparse reverse-mode AD works, but instead explains why MOI contains its own implementation, and highlights notable differences from similar packages.
Don't use the API in ReverseAD
to compute derivatives. Instead, create a Nonlinear.Evaluator
object with Nonlinear.SparseReverseMode
as the backend, and then query the MOI API methods.
Design goals
The JuliaDiff organization maintains a list of packages for doing AD in Julia. At last count, there were at least ten packages——not including ReverseAD
——for reverse-mode AD in Julia. ReverseAD
exists because it has a different set of design goals.
- Goal: handle scale and sparsity. The types of nonlinear optimization problems that MOI represents can be large scale (10^5 or more functions across 10^5 or more variables) with very sparse derivatives. The ability to compute a sparse Hessian matrix is essential. To the best of our knowledge,
ReverseAD
is the only reverse-mode AD system in Julia that handles sparsity by default. - Goal: limit the scope to improve robustness. Most other AD packages accept arbitrary Julia functions as input and then trace an expression graph using operator overloading. This means they must deal (or detect and ignore) with control flow, I/O, and other vagaries of Julia. In contrast,
ReverseAD
only accepts functions in the form ofNonlinear.Expression
, which greatly limits the range of syntax that it must deal with. By reducing the scope of what we accept as input to functions relevant for mathematical optimization, we can provide a simpler implementation with various performance optimizations. - Goal: provide outputs which match what solvers expect. Other AD packages focus on differentiating individual Julia functions. In contrast,
ReverseAD
has a very specific use-case: to generate outputs needed by the MOI nonlinear API. This means it needs to efficiently compute sparse Hessians, and it needs subexpression handling to avoid recomputing subexpressions that are shared between functions.
History
ReverseAD
started life as ReverseDiffSparse.jl, development of which began in early 2014(!). This was well before the other AD packages started development. Because we had a well-tested, working AD in JuMP, there was less motivation to contribute to and explore other AD packages. The lack of historical interaction also meant that other packages were not optimized for the types of problems that JuMP is built for (that is, large-scale sparse problems). When we first created MathOptInterface, we kept the AD in JuMP to simplify the transition, and post-poned the development of a first-class nonlinear interface in MathOptInterface.
Prior to the introduction of Nonlinear
, JuMP's nonlinear implementation was a confusing mix of functions and types spread across the code base and in the private _Derivatives
submodule. This made it hard to swap the AD system for another. The main motivation for refactoring JuMP to create the Nonlinear
submodule in MathOptInterface was to abstract the interface between JuMP and the AD system, allowing us to swap-in and test new AD systems in the future.