Sensitivity Analysis of Ridge Regression

This example illustrates the sensitivity analysis of data points in a Ridge Regression problem. The general form of the problem is given below:

\[\begin{split} \begin{array} {ll} \mbox{minimize} & \sum_{i=1}^{N} (y_{i} - w x_{i} - b)^2 + \alpha (w^2 + b^2) \\ \end{array} \end{split}\]

where

w, b are slope and intercept of the regressing line
x, y are the N data points
α is the regularization constant

which is equivalent to:

\[\begin{split} \begin{array} {ll} \mbox{minimize} & e^{\top}e + \alpha (w^2) \\ \mbox{s.t.} & e_{i} = y_{i} - w x_{i} - b \quad \quad i=1..N \\ \end{array} \end{split}\]

This tutorial uses the following packages

using JuMP
import DiffOpt
import Random
import Ipopt
import Plots
using LinearAlgebra: dot

Define and solve the problem

Construct a set of noisy (guassian) data points around a line.

Random.seed!(42)

N = 150

w_orig = 2 * abs(randn())
b = rand()
X = randn(N)
Y = w_orig * X .+ b + 0.8 * randn(N);

The helper method fit_ridge defines and solves the corresponding model. The ridge regression is modeled with quadratic programming (quadratic objective and linear constraints) and solved in generic methods of Ipopt. This is not the standard way of solving the ridge regression problem this is done here for didactic purposes.

function build_fit_ridge(X_data, Y_data, alpha = 0.1)
    N = length(Y_data)
    # Initialize a JuMP Model with Ipopt solver
    # model = Model(() -> DiffOpt.diff_optimizer(Ipopt.Optimizer)) # TODO: this is not auto detecting scalar nonlinear function (alpha * w * w)
    model = DiffOpt.nonlinear_diff_model(Ipopt.Optimizer)
    set_silent(model)
    @variable(model, w) # angular coefficient
    @variable(model, b) # linear coefficient
    @variable(model, α in Parameter(alpha)) # regularization parameter
    @variable(model, X[1:N] in Parameter.(X_data))
    @variable(model, Y[1:N] in Parameter.(Y_data))
    # expression defining approximation error
    @expression(model, e[i=1:N], Y[i] - w * X[i] - b)
    # objective minimizing squared error and ridge penalty
    @objective(model, Min, 1 / N * dot(e, e) + α * (w^2))
    optimize!(model)
    return model
end

function optimize_fit_ridge!(model, alpha)
    set_parameter_value(model[:α], alpha)
    optimize!(model)
    return model[:w], model[:b]
end

optimize_fit_ridge! (generic function with 1 method)

Plot the data points and the fitted line for different alpha values

p = Plots.scatter(X, Y; label = nothing, legend = :topleft)
mi, ma = minimum(X), maximum(X)
Plots.title!("Fitted lines and points")

model = build_fit_ridge(X, Y)
for alpha in 0.5:0.5:1.5
    local w, b = optimize_fit_ridge!(model, alpha)
    ŵ = value(w)
    b̂ = value(b)
    Plots.plot!(
        p,
        [mi, ma],
        [mi * ŵ + b̂, ma * ŵ + b̂];
        label = "alpha=$alpha",
        width = 2,
    )
end
p

Differentiate

Now that we've solved the problem, we can compute the sensitivity of optimal values of the slope w with respect to perturbations in the data points (x,y).

alpha = 0.4
w, b = optimize_fit_ridge!(model, alpha)
ŵ = value(w)
b̂ = value(b)

0.34840120767054544

Sensitivity with respect to x and y

∇y = zero(X)
∇x = zero(X)
for i in 1:N
    # X[i] sensitivity
    DiffOpt.empty_input_sensitivities!(model)
    DiffOpt.set_forward_parameter(model, model[:X][i], 1.0)
    DiffOpt.forward_differentiate!(model)
    ∇x[i] = DiffOpt.get_forward_variable(model, w)
    # Y[i] sensitivity
    DiffOpt.empty_input_sensitivities!(model)
    DiffOpt.set_forward_parameter(model, model[:Y][i], 1.0)
    DiffOpt.forward_differentiate!(model)
    ∇y[i] = DiffOpt.get_forward_variable(model, w)
end

Visualize point sensitivities with respect to regression points.

p = Plots.scatter(
    X,
    Y;
    color = [dw < 0 ? :blue : :red for dw in ∇x],
    markersize = [5 * abs(dw) + 1.2 for dw in ∇x],
    label = "",
)
mi, ma = minimum(X), maximum(X)
Plots.plot!(
    p,
    [mi, ma],
    [mi * ŵ + b̂, ma * ŵ + b̂];
    color = :blue,
    label = "",
)
Plots.title!("Regression slope sensitivity with respect to x")

p = Plots.scatter(
    X,
    Y;
    color = [dw < 0 ? :blue : :red for dw in ∇y],
    markersize = [5 * abs(dw) + 1.2 for dw in ∇y],
    label = "",
)
mi, ma = minimum(X), maximum(X)
Plots.plot!(
    p,
    [mi, ma],
    [mi * ŵ + b̂, ma * ŵ + b̂];
    color = :blue,
    label = "",
)
Plots.title!("Regression slope sensitivity with respect to y")

Note the points with less central x values induce a greater y sensitivity of the slope.

This page was generated using Literate.jl.