Logistic regression

using DataFrames
using Plots
using RDatasets
using Convex
using SCS

This is an example logistic regression using RDatasets's iris data. Our goal is to predict whether the iris species is versicolor using the sepal length and width and petal length and width.

iris = dataset("datasets", "iris");
iris[1:10, :]
10×5 DataFrame
RowSepalLengthSepalWidthPetalLengthPetalWidthSpecies
Float64Float64Float64Float64Cat…
15.13.51.40.2setosa
24.93.01.40.2setosa
34.73.21.30.2setosa
44.63.11.50.2setosa
55.03.61.40.2setosa
65.43.91.70.4setosa
74.63.41.40.3setosa
85.03.41.50.2setosa
94.42.91.40.2setosa
104.93.11.50.1setosa

We'll define Y as the outcome variable: +1 for versicolor, -1 otherwise.

Y = [species == "versicolor" ? 1.0 : -1.0 for species in iris.Species]
150-element Vector{Float64}:
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
  ⋮
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0

We'll create our data matrix with one column for each feature (first column corresponds to offset).

X = hcat(
    ones(size(iris, 1)),
    iris.SepalLength,
    iris.SepalWidth,
    iris.PetalLength,
    iris.PetalWidth,
);

Now to solve the logistic regression problem.

n, p = size(X)
beta = Variable(p)
problem = minimize(logisticloss(-Y .* (X * beta)))
solve!(problem, SCS.Optimizer; silent = true)
Problem statistics
  problem is DCP         : true
  number of variables    : 1 (5 scalar elements)
  number of constraints  : 0 (0 scalar elements)
  number of coefficients : 1_050
  number of atoms        : 453

Solution summary
  termination status : OPTIMAL
  primal status      : FEASIBLE_POINT
  dual status        : FEASIBLE_POINT
  objective value    : 72.535

Expression graph
  minimize
   └─ + (convex; real)
      ├─ logsumexp (convex; real)
      │  └─ vcat (affine; real)
      │     ├─ …
      │     └─ …
      ├─ logsumexp (convex; real)
      │  └─ vcat (affine; real)
      │     ├─ …
      │     └─ …
      ├─ logsumexp (convex; real)
      │  └─ vcat (affine; real)
      │     ├─ …
      │     └─ …
      ⋮

Let's see how well the model fits.

using Plots
logistic(x::Real) = inv(exp(-x) + one(x))
perm = sortperm(vec(X * evaluate(beta)))
plot(1:n, (Y[perm] .+ 1) / 2, st = :scatter)
plot!(1:n, logistic.(X * evaluate(beta))[perm])
Example block output

This page was generated using Literate.jl.