All of the examples can be found in Jupyter notebook form here.
Logistic regression
using DataFrames
using Plots
using RDatasets
using Convex
using SCS
This is an example logistic regression using RDatasets
's iris data. Our goal is to gredict whether the iris species is versicolor using the sepal length and width and petal length and width.
iris = dataset("datasets", "iris");
iris[1:10,:]
10 rows × 5 columns
SepalLength | SepalWidth | PetalLength | PetalWidth | Species | |
---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | Categorical… | |
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
We'll define Y
as the outcome variable: +1 for versicolor, -1 otherwise.
Y = [species == "versicolor" ? 1.0 : -1.0 for species in iris.Species]
150-element Array{Float64,1}:
-1.0
-1.0
-1.0
-1.0
-1.0
-1.0
-1.0
-1.0
-1.0
-1.0
⋮
-1.0
-1.0
-1.0
-1.0
-1.0
-1.0
-1.0
-1.0
-1.0
We'll create our data matrix with one column for each feature (first column corresponds to offset).
X = hcat(ones(size(iris, 1)), iris.SepalLength, iris.SepalWidth, iris.PetalLength, iris.PetalWidth);
Now to solve the logistic regression problem.
n, p = size(X)
beta = Variable(p)
problem = minimize(logisticloss(-Y.*(X*beta)))
solve!(problem, () -> SCS.Optimizer(verbose=false))
Let's see how well the model fits.
using Plots
logistic(x::Real) = inv(exp(-x) + one(x))
perm = sortperm(vec(X*beta.value))
plot(1:n, (Y[perm] .+ 1)/2, st=:scatter)
plot!(1:n, logistic.(X*beta.value)[perm])
This page was generated using Literate.jl.