Sensitivity Analysis of SVM

This notebook illustrates sensitivity analysis of data points in a Support Vector Machine (inspired from @matbesancon's SimpleSVMs.)

For reference, Section 10.1 of https://online.stat.psu.edu/stat508/book/export/html/792 gives an intuitive explanation of what it means to have a sensitive hyperplane or data point. The general form of the SVM training problem is given below (with $\ell_2$ regularization):

\[\begin{split} \begin{array} {ll} \mbox{minimize} & \lambda||w||^2 + \sum_{i=1}^{N} \xi_{i} \\ \mbox{s.t.} & \xi_{i} \ge 0 \quad \quad i=1..N \\ & y_{i} (w^T X_{i} + b) \ge 1 - \xi_{i} \quad i=1..N \\ \end{array} \end{split}\]

where

X, y are the N data points
w is the support vector
b determines the offset b/||w|| of the hyperplane with normal w
ξ is the soft-margin loss
λ is the $\ell_2$ regularization.

This tutorial uses the following packages

using JuMP     # The mathematical programming modelling language
import DiffOpt # JuMP extension for differentiable optimization
import Ipopt   # Optimization solver that handles quadratic programs
import LinearAlgebra
import Plots
import Random

Define and solve the SVM

Construct two clusters of data points.

N = 100
D = 2

Random.seed!(62)
X = vcat(randn(N ÷ 2, D), randn(N ÷ 2, D) .+ [2.0, 2.0]')
y = append!(ones(N ÷ 2), -ones(N ÷ 2))
λ = 0.05;

Let's initialize a special model that can understand sensitivities

model = Model(() -> DiffOpt.diff_optimizer(Ipopt.Optimizer))
MOI.set(model, MOI.Silent(), true)

Add the variables

@variable(model, ξ[1:N] >= 0)
@variable(model, w[1:D])
@variable(model, b);

Add the constraints.

@constraint(
    model,
    con[i in 1:N],
    y[i] * (LinearAlgebra.dot(X[i, :], w) + b) >= 1 - ξ[i]
);

Define the objective and solve

@objective(model, Min, λ * LinearAlgebra.dot(w, w) + sum(ξ))

optimize!(model)

We can visualize the separating hyperplane.

loss = objective_value(model)

wv = value.(w)

bv = value(b)

svm_x = [-2.0, 4.0] # arbitrary points
svm_y = (-bv .- wv[1] * svm_x) / wv[2]

p = Plots.scatter(
    X[:, 1],
    X[:, 2];
    color = [yi > 0 ? :red : :blue for yi in y],
    label = "",
)
Plots.plot!(
    p,
    svm_x,
    svm_y;
    label = "loss = $(round(loss, digits=2))",
    width = 3,
)

Gradient of hyperplane wrt the data point coordinates

Now that we've solved the SVM, we can compute the sensitivity of optimal values – the separating hyperplane in our case – with respect to perturbations of the problem data – the data points – using DiffOpt.

How does a change in coordinates of the data points, X, affects the position of the hyperplane? This is achieved by finding gradients of w and b with respect to X[i].

Begin differentiating the model. analogous to varying θ in the expression:

\[y_{i} (w^T (X_{i} + \theta) + b) \ge 1 - \xi_{i}\]

∇ = zeros(N)
for i in 1:N
    for j in 1:N
        if i == j
            # we consider identical perturbations on all x_i coordinates
            MOI.set(
                model,
                DiffOpt.ForwardConstraintFunction(),
                con[j],
                y[j] * sum(w),
            )
        else
            MOI.set(model, DiffOpt.ForwardConstraintFunction(), con[j], 0.0)
        end
    end
    DiffOpt.forward_differentiate!(model)
    dw = MOI.get.(model, DiffOpt.ForwardVariablePrimal(), w)
    db = MOI.get(model, DiffOpt.ForwardVariablePrimal(), b)
    ∇[i] = LinearAlgebra.norm(dw) + LinearAlgebra.norm(db)
end

We can visualize the separating hyperplane sensitivity with respect to the data points. Note that all the small numbers were converted into 1/10 of the largest value to show all the points of the set.

p3 = Plots.scatter(
    X[:, 1],
    X[:, 2];
    color = [yi > 0 ? :red : :blue for yi in y],
    label = "",
    markersize = 2 * (max.(1.8∇, 0.2 * maximum(∇))),
)
Plots.yaxis!(p3, (-2, 4.5))
Plots.plot!(p3, svm_x, svm_y; label = "", width = 3)
Plots.title!("Sensitivity of the separator to data point variations")

This page was generated using Literate.jl.