All of the examples can be found in Jupyter notebook form here.

Logistic regression

using DataFrames
using Plots
using RDatasets
using Convex
using SCS

This is an example logistic regression using RDatasets's iris data. Our goal is to gredict whether the iris species is versicolor using the sepal length and width and petal length and width.

iris = dataset("datasets", "iris");
iris[1:10,:]

10 rows × 5 columns

SepalLengthSepalWidthPetalLengthPetalWidthSpecies
Float64Float64Float64Float64Categorical…
15.13.51.40.2setosa
24.93.01.40.2setosa
34.73.21.30.2setosa
44.63.11.50.2setosa
55.03.61.40.2setosa
65.43.91.70.4setosa
74.63.41.40.3setosa
85.03.41.50.2setosa
94.42.91.40.2setosa
104.93.11.50.1setosa

We'll define Y as the outcome variable: +1 for versicolor, -1 otherwise.

Y = [species == "versicolor" ? 1.0 : -1.0 for species in iris.Species]
150-element Array{Float64,1}:
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
  ⋮  
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0
 -1.0

We'll create our data matrix with one column for each feature (first column corresponds to offset).

X = hcat(ones(size(iris, 1)), iris.SepalLength, iris.SepalWidth, iris.PetalLength, iris.PetalWidth);

Now to solve the logistic regression problem.

n, p = size(X)
beta = Variable(p)
problem = minimize(logisticloss(-Y.*(X*beta)))
solve!(problem, () -> SCS.Optimizer(verbose=false))

Let's see how well the model fits.

using Plots
logistic(x::Real) = inv(exp(-x) + one(x))
perm = sortperm(vec(X*beta.value))
plot(1:n, (Y[perm] .+ 1)/2, st=:scatter)
plot!(1:n, logistic.(X*beta.value)[perm])
0501001500.000.250.500.751.00y1y2

This page was generated using Literate.jl.