Automatic differentiation of user-defined operators
This tutorial was generated using Literate.jl. Download the source as a .jl
file.
The purpose of this tutorial is to demonstrate how to apply automatic differentiation to User-defined operators.
This tutorial is for advanced users. As an alternative, consider using Function tracing instead of creating an operator, and if an operator is necessary, consider using the default of @operator(model, op_f, N, f)
instead of passing explicit Gradients and Hessians.
Required packages
This tutorial uses the following packages:
using JuMP
import DifferentiationInterface
import Enzyme
import ForwardDiff
import Ipopt
import Test
Primal function
As a simple example, we consider the Rosenbrock function:
f(x::T...) where {T} = (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2
f (generic function with 1 method)
Here's the value at a random point:
x = rand(2)
2-element Vector{Float64}:
0.8809678954777748
0.5901641516460916
f(x...)
3.4715474597921627
Analytic derivative
If expressions are simple enough, you can provide analytic functions for the gradient and Hessian.
Gradient
The Rosenbrock function has the gradient vector:
function analytic_∇f(g::AbstractVector, x...)
g[1] = 400 * x[1]^3 - 400 * x[1] * x[2] + 2 * x[1] - 2
g[2] = 200 * (x[2] - x[1]^2)
return
end
analytic_∇f (generic function with 1 method)
Let's evaluate it at the same vector x
:
analytic_g = zeros(2)
analytic_∇f(analytic_g, x...)
analytic_g
2-element Vector{Float64}:
65.28490308207542
-37.18805624328958
Hessian
The Hessian matrix is:
function analytic_∇²f(H::AbstractMatrix, x...)
H[1, 1] = 1200 * x[1]^2 - 400 * x[2] + 2
# H[1, 2] = -400 * x[1] <-- not needed because Hessian is symmetric
H[2, 1] = -400 * x[1]
H[2, 2] = 200.0
return
end
analytic_∇²f (generic function with 1 method)
Note that because the Hessian is symmetric, JuMP requires that we fill in only the lower triangle.
analytic_H = zeros(2, 2)
analytic_∇²f(analytic_H, x...)
analytic_H
2×2 Matrix{Float64}:
697.26 0.0
-352.387 200.0
JuMP example
Putting our analytic functions together, we get:
function analytic_rosenbrock()
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x[1:2])
@operator(model, op_rosenbrock, 2, f, analytic_∇f, analytic_∇²f)
@objective(model, Min, op_rosenbrock(x[1], x[2]))
optimize!(model)
Test.@test is_solved_and_feasible(model)
return value.(x)
end
analytic_rosenbrock()
2-element Vector{Float64}:
0.9999999999999655
0.9999999999999305
ForwardDiff
Instead of analytic functions, you can use ForwardDiff.jl to compute derivatives.
If you do not specify a gradient or Hessian, JuMP will use ForwardDiff.jl to compute derivatives by default. We provide this section as a worked example of what is going on under the hood.
Pros and cons
The main benefit of ForwardDiff is that it is simple, robust, and works with a broad range of Julia syntax.
The main downside is that f
must be a function that accepts arguments of x::Real...
. See Common mistakes when writing a user-defined operator for more details.
Gradient
The gradient can be computed using ForwardDiff.gradient!
. Note that ForwardDiff expects a single Vector{T}
argument, but we receive x
as a tuple, so we need y -> f(y...)
and collect(x)
to get things in the right format.
function fdiff_∇f(g::AbstractVector{T}, x::Vararg{T,N}) where {T,N}
ForwardDiff.gradient!(g, y -> f(y...), collect(x))
return
end
fdiff_∇f (generic function with 1 method)
Let's check that we find the analytic solution:
fdiff_g = zeros(2)
fdiff_∇f(fdiff_g, x...)
Test.@test ≈(analytic_g, fdiff_g)
Test Passed
Hessian
The Hessian is a bit more complicated, but code to implement it is:
function fdiff_∇²f(H::AbstractMatrix{T}, x::Vararg{T,N}) where {T,N}
h = ForwardDiff.hessian(y -> f(y...), collect(x))
for i in 1:N, j in 1:i
H[i, j] = h[i, j]
end
return
end
fdiff_∇²f (generic function with 1 method)
Let's check that we find the analytic solution:
fdiff_H = zeros(2, 2)
fdiff_∇²f(fdiff_H, x...)
Test.@test ≈(analytic_H, fdiff_H)
Test Passed
JuMP example
The code for computing the gradient and Hessian using ForwardDiff can be re-used for many operators. Thus, it is helpful to encapsulate it into the function:
"""
fdiff_derivatives(f::Function) -> Tuple{Function,Function}
Return a tuple of functions that evaluate the gradient and Hessian of `f` using
ForwardDiff.jl.
"""
function fdiff_derivatives(f::Function)
function ∇f(g::AbstractVector{T}, x::Vararg{T,N}) where {T,N}
ForwardDiff.gradient!(g, y -> f(y...), collect(x))
return
end
function ∇²f(H::AbstractMatrix{T}, x::Vararg{T,N}) where {T,N}
h = ForwardDiff.hessian(y -> f(y...), collect(x))
for i in 1:N, j in 1:i
H[i, j] = h[i, j]
end
return
end
return ∇f, ∇²f
end
Main.fdiff_derivatives
Here's an example using fdiff_derivatives
:
function fdiff_rosenbrock()
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x[1:2])
@operator(model, op_rosenbrock, 2, f, fdiff_derivatives(f)...)
@objective(model, Min, op_rosenbrock(x[1], x[2]))
optimize!(model)
Test.@test is_solved_and_feasible(model)
return value.(x)
end
fdiff_rosenbrock()
2-element Vector{Float64}:
0.9999999999999899
0.9999999999999792
Enzyme
Another library for automatic differentiation in Julia is Enzyme.jl.
Pros and cons
The main benefit of Enzyme is that it can produce fast derivatives for functions with many input arguments.
The main downsides are that it may throw unusual errors if your code uses an unsupported feature of Julia and that it may have large compile times.
The JuMP developers cannot help you debug error messages related to Enzyme. If the operator works, it works. If not, we suggest you try a different automatic differentiation library. See juliadiff.org for details.
Gradient
The gradient can be computed using Enzyme.autodiff
with the Enzyme.Reverse
mode. We need to wrap x
in Enzyme.Active
to indicate that we want to compute the derivatives with respect to these arguments.
function enzyme_∇f(g::AbstractVector{T}, x::Vararg{T,N}) where {T,N}
g .= Enzyme.autodiff(Enzyme.Reverse, f, Enzyme.Active.(x)...)[1]
return
end
enzyme_∇f (generic function with 1 method)
Let's check that we find the analytic solution:
enzyme_g = zeros(2)
enzyme_∇f(enzyme_g, x...)
Test.@test ≈(analytic_g, enzyme_g)
Test Passed
Hessian
We can compute the Hessian in Enzyme using forward-over-reverse automatic differentiation.
The code to implement the Hessian in Enzyme is complicated, so we will not explain it in detail; see the Enzyme documentation.
function enzyme_∇²f(H::AbstractMatrix{T}, x::Vararg{T,N}) where {T,N}
# direction(i) returns a tuple with a `1` in the `i`'th entry and `0`
# otherwise
direction(i) = ntuple(j -> Enzyme.Active(T(i == j)), N)
# As the inner function, compute the gradient using Reverse mode
∇f(x...) = Enzyme.autodiff(Enzyme.Reverse, f, Enzyme.Active, x...)[1]
# For the outer autodiff, use Forward mode.
hess = Enzyme.autodiff(
Enzyme.Forward,
∇f,
# Compute multiple evaluations of Forward mode, each time using `x` but
# initializing with a different direction.
Enzyme.BatchDuplicated.(Enzyme.Active.(x), ntuple(direction, N))...,
)[1]
# Unpack Enzyme's `hess` data structure into the matrix `H` expected by
# JuMP.
for j in 1:N, i in 1:j
H[j, i] = hess[j][i]
end
return
end
enzyme_∇²f (generic function with 1 method)
Let's check that we find the analytic solution:
enzyme_H = zeros(2, 2)
enzyme_∇²f(enzyme_H, x...)
Test.@test ≈(analytic_H, enzyme_H)
Test Passed
JuMP example
The code for computing the gradient and Hessian using Enzyme can be re-used for many operators. Thus, it is helpful to encapsulate it into the function:
"""
enzyme_derivatives(f::Function) -> Tuple{Function,Function}
Return a tuple of functions that evaluate the gradient and Hessian of `f` using
Enzyme.jl.
"""
function enzyme_derivatives(f::Function)
function ∇f(g::AbstractVector{T}, x::Vararg{T,N}) where {T,N}
g .= Enzyme.autodiff(Enzyme.Reverse, f, Enzyme.Active.(x)...)[1]
return
end
function ∇²f(H::AbstractMatrix{T}, x::Vararg{T,N}) where {T,N}
direction(i) = ntuple(j -> Enzyme.Active(T(i == j)), N)
∇f(x...) = Enzyme.autodiff(Enzyme.Reverse, f, Enzyme.Active, x...)[1]
hess = Enzyme.autodiff(
Enzyme.Forward,
∇f,
Enzyme.BatchDuplicated.(Enzyme.Active.(x), ntuple(direction, N))...,
)[1]
for j in 1:N, i in 1:j
H[j, i] = hess[j][i]
end
return
end
return ∇f, ∇²f
end
Main.enzyme_derivatives
Here's an example using enzyme_derivatives
:
function enzyme_rosenbrock()
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x[1:2])
@operator(model, op_rosenbrock, 2, f, enzyme_derivatives(f)...)
@objective(model, Min, op_rosenbrock(x[1], x[2]))
optimize!(model)
Test.@test is_solved_and_feasible(model)
return value.(x)
end
enzyme_rosenbrock()
2-element Vector{Float64}:
0.9999999999999899
0.9999999999999792
DifferentiationInterface
Julia offers many different autodiff packages. DifferentiationInterface.jl is a package that provides an abstraction layer across a few underlying autodiff libraries.
The JuMP developers cannot help you debug error messages related to DifferentiationInterface. If the operator works, it works. If not, we suggest you directly try using a different automatic differentiation library rather than the DI wrapper. See juliadiff.org for details.
All the necessary information about your choice of underlying autodiff package is encoded in a "backend object" like this one:
DifferentiationInterface.AutoForwardDiff()
ADTypes.AutoForwardDiff()
This type comes from another package called ADTypes.jl, but DifferentiationInterface re-exports it. Other options include AutoZygote()
and AutoFiniteDiff()
.
Gradient
Apart from providing the backend object, the syntax below remains very similar:
function di_∇f(
g::AbstractVector{T},
x::Vararg{T,N};
backend = DifferentiationInterface.AutoForwardDiff(),
) where {T,N}
DifferentiationInterface.gradient!(splat(f), g, backend, collect(x))
return
end
di_∇f (generic function with 1 method)
Let's check that we find the analytic solution:
di_g = zeros(2)
di_∇f(di_g, x...)
Test.@test ≈(analytic_g, di_g)
Test Passed
Hessian
The Hessian follows exactly the same logic, except we need only the lower triangle.
function di_∇²f(
H::AbstractMatrix{T},
x::Vararg{T,N};
backend = DifferentiationInterface.AutoForwardDiff(),
) where {T,N}
H_dense = DifferentiationInterface.hessian(splat(f), backend, collect(x))
for i in 1:N, j in 1:i
H[i, j] = H_dense[i, j]
end
return
end
di_∇²f (generic function with 1 method)
Let's check that we find the analytic solution:
di_H = zeros(2, 2)
di_∇²f(di_H, x...)
Test.@test ≈(analytic_H, di_H)
Test Passed
JuMP example
The code for computing the gradient and Hessian using DifferentiationInterface can be re-used for many operators. Thus, it is helpful to encapsulate it into the function:
"""
di_derivatives(f::Function; backend) -> Tuple{Function,Function}
Return a tuple of functions that evaluate the gradient and Hessian of `f` using
DifferentiationInterface.jl with any given `backend`.
"""
function di_derivatives(f::Function; backend)
function ∇f(g::AbstractVector{T}, x::Vararg{T,N}) where {T,N}
DifferentiationInterface.gradient!(splat(f), g, backend, collect(x))
return
end
function ∇²f(H::AbstractMatrix{T}, x::Vararg{T,N}) where {T,N}
H_dense =
DifferentiationInterface.hessian(splat(f), backend, collect(x))
for i in 1:N, j in 1:i
H[i, j] = H_dense[i, j]
end
return
end
return ∇f, ∇²f
end
Main.di_derivatives
Here's an example using di_derivatives
:
function di_rosenbrock(; backend)
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x[1:2])
@operator(model, op_rosenbrock, 2, f, di_derivatives(f; backend)...)
@objective(model, Min, op_rosenbrock(x[1], x[2]))
optimize!(model)
Test.@test is_solved_and_feasible(model)
return value.(x)
end
di_rosenbrock(; backend = DifferentiationInterface.AutoForwardDiff())
2-element Vector{Float64}:
0.9999999999999899
0.9999999999999792