SymbolicAD

The Nonlinear.SymbolicAD submodule contains data structures and functions for working with the symbolic derivatives of a nonlinear optimization problem. This page explains the API and describes the rationale behind its design.

Background

The code in SymbolicAD is inspired by Hassan Hijazi's work on coin-or/gravity, a high-performance algebraic modeling language in C++.

Hassan made the following observations:

  • For large scale models, symbolic differentiation is slower than other automatic differentiation techniques, such as the reverse mode algorithm implemented in MOI.Nonlinear.ReverseAD.
  • However, most large-scale nonlinear programs have a lot of structure.
  • Gravity asks the user to provide structure in the form of template constraints, where the user gives the symbolic form of the constraint as well as a set of data to convert from a symbolic form to the numerical form.
  • Instead of differentiating each constraint in its numerical form, we can compute one symbolic derivative of the constraint in symbolic form, and then plug in the data in to get the numerical derivative of each function.
  • As a final step, if users don't provide the structure, we can still infer it –perhaps with less accuracy–by comparing the expression tree of each constraint.

The symbolic differentiation approach of Gravity works well when the problem is large with few unique constraints. For example, a model like:

model = Model()
@variable(model, 0 <= x[1:10_000] <= 1)
@constraint(model, [i=1:10_000], sin(x[i]) <= 1)
@objective(model, Max, sum(x))

is ideal, because although the Jacobian matrix has 10,000 rows, we can compute the derivative of sin(x[i]) as cos(x[i]), and then fill in the Jacobian by evaluating the derivative function instead of having to differentiation 10,000 expressions.

The symbolic differentiation approach of Gravity works poorly if there are a large number of unique constraints in the model (which would require a lot of expressions to be symbolically differentiated), or if the nonlinear functions contain a large number of nonlinear terms (which would make the symbolic derivative expensive to compute).

SymbolicAD started life as MathOptSymbolicAD.jl, development of which began in early 2022. This initial version of SymbolicAD used the Symbolics.jl package to compute the symbolic derivatives. In 2025, we rewrote MathOptSymbolicAD.jl to remove the dependence on Symbolics.jl, and, since the rewrite depended only on MathOptInterface, we added it to MOI.Nonlinear as a new submodule.

For more details on MathOptSymbolicAD.jl, see Oscar's JuMP-dev 2022 talk, although note that the syntax has changed since the original recording.

Use SymbolicAD with JuMP

To use SymbolicAD with JuMP, set the AutomaticDifferentiationBackend attribute to Nonlinear.SymbolicMode:

using JuMP, Ipopt
model = Model(Ipopt.Optimizer)
set_attribute(
    model,
    MOI.AutomaticDifferentiationBackend(),
    MOI.Nonlinear.SymbolicMode(),
)
@variable(model, x[1:2])
@objective(model, Min, (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2)
optimize!(model)

To revert back to the default sparse reverse mode algorithm, set the AutomaticDifferentiationBackend attribute to Nonlinear.SparseReverseMode.

simplify

Use Nonlinear.SymbolicAD.simplify to simplify nonlinear expressions. The simplification algorithm performs simple rewrites such as lifting nested summations:

julia> x = MOI.VariableIndex(1)
MOI.VariableIndex(1)

julia> f = MOI.ScalarNonlinearFunction(
           :+,
           Any[MOI.ScalarNonlinearFunction(:+, Any[1.0, x]), 2.0 * x + 3.0],
       )
+(+(1.0, MOI.VariableIndex(1)), 3.0 + 2.0 MOI.VariableIndex(1))

julia> MOI.Nonlinear.SymbolicAD.simplify(f)
4.0 + 3.0 MOI.VariableIndex(1)

and trivial identities such as $x^1 = x$:

julia> x = MOI.VariableIndex(1)
MOI.VariableIndex(1)

julia> f = MOI.ScalarNonlinearFunction(:^, Any[x, 1])
^(MOI.VariableIndex(1), (1))

julia> MOI.Nonlinear.SymbolicAD.simplify(f)
MOI.VariableIndex(1)

The list of rewrites that will be made is intentionally limited to keep the codebase simple. Nonlinear.SymbolicAD is not a substitute for a Computer Algebraic System (CAS). For example, we do not detect the relationship $sin(x)^2+cos(x)^2=1$:

julia> x = MOI.VariableIndex(1)
MOI.VariableIndex(1)

julia> sin_x = MOI.ScalarNonlinearFunction(:sin, Any[x])
sin(MOI.VariableIndex(1))

julia> cos_x = MOI.ScalarNonlinearFunction(:cos, Any[x])
cos(MOI.VariableIndex(1))

julia> f = MOI.ScalarNonlinearFunction(
           :+,
           Any[
               MOI.ScalarNonlinearFunction(:^, Any[sin_x, 2]),
               MOI.ScalarNonlinearFunction(:^, Any[cos_x, 2]),
           ],
       )
+(^(sin(MOI.VariableIndex(1)), (2)), ^(cos(MOI.VariableIndex(1)), (2)))

julia> MOI.Nonlinear.SymbolicAD.simplify(f)
+(^(sin(MOI.VariableIndex(1)), (2)), ^(cos(MOI.VariableIndex(1)), (2)))

In addition to Nonlinear.SymbolicAD.simplify, there is an in-place version Nonlinear.SymbolicAD.simplify! that may make changes to the existing function.

variables

Use Nonlinear.SymbolicAD.variables to return a sorted list of the variables that appear in the function:

julia> x = MOI.VariableIndex.(1:3)
3-element Vector{MathOptInterface.VariableIndex}:
 MOI.VariableIndex(1)
 MOI.VariableIndex(2)
 MOI.VariableIndex(3)

julia> f = MOI.ScalarNonlinearFunction(:atan, Any[x[3], 2.0 * x[1]])
atan(MOI.VariableIndex(3), 0.0 + 2.0 MOI.VariableIndex(1))

julia> MOI.Nonlinear.SymbolicAD.variables(f)
2-element Vector{MathOptInterface.VariableIndex}:
 MOI.VariableIndex(1)
 MOI.VariableIndex(3)

derivative

Use Nonlinear.SymbolicAD.derivative to compute the symbolic derivative of a function with respect to a decision variable:

julia> x = MOI.VariableIndex(1)
MOI.VariableIndex(1)

julia> f = MOI.ScalarNonlinearFunction(:sin, Any[x])
sin(MOI.VariableIndex(1))

julia> MOI.Nonlinear.SymbolicAD.derivative(f, x)
cos(MOI.VariableIndex(1))

Note that the resultant expression can often be simplified. Thus, in most cases you should call Nonlinear.SymbolicAD.simplify on the expression before using it in other places:

julia> x = MOI.VariableIndex(1)
MOI.VariableIndex(1)

julia> f = MOI.ScalarNonlinearFunction(:sin, Any[x + 1.0])
sin(1.0 + 1.0 MOI.VariableIndex(1))

julia> df_dx = MOI.Nonlinear.SymbolicAD.derivative(f, x)
*(cos(1.0 + 1.0 MOI.VariableIndex(1)), 1.0)

julia> MOI.Nonlinear.SymbolicAD.simplify!(df_dx)
cos(1.0 + 1.0 MOI.VariableIndex(1))

gradient_and_hessian

In some cases, you may want to compute the gradient and (sparse) Hessian matrix of a function. One way to achieve this is by recursively calling Nonlinear.SymbolicAD.derivative on the result of Nonlinear.SymbolicAD.derivative for each variable in the list of Nonlinear.SymbolicAD.variables. But, to simplify the process, you should use Nonlinear.SymbolicAD.gradient_and_hessian:

julia> x = MOI.VariableIndex.(1:2);

julia> f = MOI.ScalarNonlinearFunction(:sin, Any[1.0 * x[1] + 2.0 * x[2]])
sin(0.0 + 1.0 MOI.VariableIndex(1) + 2.0 MOI.VariableIndex(2))

julia> y, ∇f, H, ∇²f = MOI.Nonlinear.SymbolicAD.gradient_and_hessian(f);

julia> y
2-element Vector{MathOptInterface.VariableIndex}:
 MOI.VariableIndex(1)
 MOI.VariableIndex(2)

julia> ∇f
2-element Vector{Any}:
 cos(0.0 + 1.0 MOI.VariableIndex(1) + 2.0 MOI.VariableIndex(2))
 *(cos(0.0 + 1.0 MOI.VariableIndex(1) + 2.0 MOI.VariableIndex(2)), 2.0)

julia> H
3-element Vector{Tuple{Int64, Int64}}:
 (1, 1)
 (1, 2)
 (2, 2)

julia> ∇²f
3-element Vector{Any}:
 -(sin(0.0 + 1.0 MOI.VariableIndex(1) + 2.0 MOI.VariableIndex(2)))
 *(-(sin(0.0 + 1.0 MOI.VariableIndex(1) + 2.0 MOI.VariableIndex(2))), 2.0)
 *(-(sin(0.0 + 1.0 MOI.VariableIndex(1) + 2.0 MOI.VariableIndex(2))), 4.0)

where:

  • y is the list of variables that appear in f
  • ∇f is the first partial derivatives of f with respect to each variable in y
  • H and ∇²f form a sparse Hessian matrix, were H is the (row, column) index of each element, and the corresponding element in ∇²f is the second partial derivative of f with respect to y[row] and y[column].

Unlike Nonlinear.SymbolicAD.derivative, the gradient and Hessian expressions have already been simplified; you do not need to call Nonlinear.SymbolicAD.simplify.