The diet problem

This tutorial was generated using Literate.jl. Download the source as a .jl file.

The purpose of this tutorial is to demonstrate how to incorporate DataFrames into a JuMP model. As an example, we use classic Stigler diet problem.

Required packages

This tutorial requires the following packages:

using JuMP
import CSV
import DataFrames
import HiGHS
import Test

Formulation

We wish to cook a nutritionally balanced meal by choosing the quantity of each food $f$ to eat from a set of foods $F$ in our kitchen.

Each food $f$ has a cost, $c_f$, as well as a macro-nutrient profile $a_{m,f}$ for each macro-nutrient $m \in M$.

Because we care about a nutritionally balanced meal, we set some minimum and maximum limits for each nutrient, which we denote $l_m$ and $u_m$ respectively.

Furthermore, because we are optimizers, we seek the minimum cost solution.

With a little effort, we can formulate our dinner problem as the following linear program:

\[\begin{aligned} \min & \sum\limits_{f \in F} c_f x_f \\ \text{s.t.}\ \ & l_m \le \sum\limits_{f \in F} a_{m,f} x_f \le u_m, && \forall m \in M \\ & x_f \ge 0, && \forall f \in F. \end{aligned}\]

In the rest of this tutorial, we will create and solve this problem in JuMP, and learn what we should cook for dinner.

Data

First, we need some data for the problem. For this tutorial, we'll write CSV files to a temporary directory from Julia. If you have existing files, you could change the filenames to point to them instead.

dir = mktempdir()
"/tmp/jl_p6fK8k"

The first file is a list of foods with their macro-nutrient profile:

food_csv_filename = joinpath(dir, "diet_foods.csv")
open(food_csv_filename, "w") do io
    write(
        io,
        """
        name,cost,calories,protein,fat,sodium
        hamburger,2.49,410,24,26,730
        chicken,2.89,420,32,10,1190
        hot dog,1.50,560,20,32,1800
        fries,1.89,380,4,19,270
        macaroni,2.09,320,12,10,930
        pizza,1.99,320,15,12,820
        salad,2.49,320,31,12,1230
        milk,0.89,100,8,2.5,125
        ice cream,1.59,330,8,10,180
        """,
    )
    return
end
foods = CSV.read(food_csv_filename, DataFrames.DataFrame)
9×6 DataFrame
Rownamecostcaloriesproteinfatsodium
String15Float64Int64Int64Float64Int64
1hamburger2.494102426.0730
2chicken2.894203210.01190
3hot dog1.55602032.01800
4fries1.89380419.0270
5macaroni2.093201210.0930
6pizza1.993201512.0820
7salad2.493203112.01230
8milk0.8910082.5125
9ice cream1.59330810.0180

Here, $F$ is foods.name and $c_f$ is foods.cost. (We're also playing a bit loose the term "macro-nutrient" by including calories and sodium.)

We also need our minimum and maximum limits:

nutrient_csv_filename = joinpath(dir, "diet_nutrient.csv")
open(nutrient_csv_filename, "w") do io
    write(
        io,
        """
        nutrient,min,max
        calories,1800,2200
        protein,91,
        fat,0,65
        sodium,0,1779
        """,
    )
    return
end
limits = CSV.read(nutrient_csv_filename, DataFrames.DataFrame)
4×3 DataFrame
Rownutrientminmax
String15Int64Int64?
1calories18002200
2protein91missing
3fat065
4sodium01779

Protein is missing data for the maximum. Let's fix that using coalesce:

limits.max = coalesce.(limits.max, Inf)
limits
4×3 DataFrame
Rownutrientminmax
String15Int64Real
1calories18002200
2protein91Inf
3fat065
4sodium01779

JuMP formulation

Now we're ready to convert our mathematical formulation into a JuMP model.

First, create a new JuMP model. Since we have a linear program, we'll use HiGHS as our optimizer:

model = Model(HiGHS.Optimizer)
set_silent(model)

Next, we create a set of decision variables x, with one element for each row in the DataFrame, and each x has a lower bound of 0:

@variable(model, x[foods.name] >= 0)
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
    Dimension 1, InlineStrings.String15["hamburger", "chicken", "hot dog", "fries", "macaroni", "pizza", "salad", "milk", "ice cream"]
And data, a 9-element Vector{VariableRef}:
 x[hamburger]
 x[chicken]
 x[hot dog]
 x[fries]
 x[macaroni]
 x[pizza]
 x[salad]
 x[milk]
 x[ice cream]

To simplify things later on, we store the vector as a new column x in the DataFrame foods. Since x is a DenseAxisArray, we first need to convert it to an Array:

foods.x = Array(x)
9-element Vector{VariableRef}:
 x[hamburger]
 x[chicken]
 x[hot dog]
 x[fries]
 x[macaroni]
 x[pizza]
 x[salad]
 x[milk]
 x[ice cream]

Our objective is to minimize the total cost of purchasing food:

@objective(model, Min, sum(foods.cost .* foods.x));

For the next component, we need to add a constraint that our total intake of each component is within the limits contained in the limits DataFrame:

@constraint(
    model,
    [row in eachrow(limits)],
    row.min <= sum(foods[!, row.nutrient] .* foods.x) <= row.max,
);

What does our model look like?

print(model)
Min 2.49 x[hamburger] + 2.89 x[chicken] + 1.5 x[hot dog] + 1.89 x[fries] + 2.09 x[macaroni] + 1.99 x[pizza] + 2.49 x[salad] + 0.89 x[milk] + 1.59 x[ice cream]
Subject to
 410 x[hamburger] + 420 x[chicken] + 560 x[hot dog] + 380 x[fries] + 320 x[macaroni] + 320 x[pizza] + 320 x[salad] + 100 x[milk] + 330 x[ice cream] ∈ [1800, 2200]
 24 x[hamburger] + 32 x[chicken] + 20 x[hot dog] + 4 x[fries] + 12 x[macaroni] + 15 x[pizza] + 31 x[salad] + 8 x[milk] + 8 x[ice cream] ∈ [91, Inf]
 26 x[hamburger] + 10 x[chicken] + 32 x[hot dog] + 19 x[fries] + 10 x[macaroni] + 12 x[pizza] + 12 x[salad] + 2.5 x[milk] + 10 x[ice cream] ∈ [0, 65]
 730 x[hamburger] + 1190 x[chicken] + 1800 x[hot dog] + 270 x[fries] + 930 x[macaroni] + 820 x[pizza] + 1230 x[salad] + 125 x[milk] + 180 x[ice cream] ∈ [0, 1779]
 x[hamburger] ≥ 0
 x[chicken] ≥ 0
 x[hot dog] ≥ 0
 x[fries] ≥ 0
 x[macaroni] ≥ 0
 x[pizza] ≥ 0
 x[salad] ≥ 0
 x[milk] ≥ 0
 x[ice cream] ≥ 0

Solution

Let's optimize and take a look at the solution:

optimize!(model)
@assert is_solved_and_feasible(model)
solution_summary(model)
* Solver : HiGHS

* Status
  Result count       : 1
  Termination status : OPTIMAL
  Message from the solver:
  "kHighsModelStatusOptimal"

* Candidate solution (result #1)
  Primal status      : FEASIBLE_POINT
  Dual status        : FEASIBLE_POINT
  Objective value    : 1.18289e+01
  Objective bound    : 0.00000e+00
  Relative gap       : Inf
  Dual objective value : 1.18289e+01

* Work counters
  Solve time (sec)   : 2.16722e-04
  Simplex iterations : 6
  Barrier iterations : 0
  Node count         : -1

We found an optimal solution. Let's see what the optimal solution is:

for row in eachrow(foods)
    println(row.name, " = ", value(row.x))
end
hamburger = 0.6045138888888871
chicken = 0.0
hot dog = 0.0
fries = 0.0
macaroni = 0.0
pizza = 0.0
salad = 0.0
milk = 6.9701388888888935
ice cream = 2.5913194444444447

That's a lot of milk and ice cream, and sadly, we only get 0.6 of a hamburger.

We can also use the function Containers.rowtable to easily convert the result into a DataFrame:

table = Containers.rowtable(value, x; header = [:food, :quantity])
solution = DataFrames.DataFrame(table)
9×2 DataFrame
Rowfoodquantity
String15Float64
1hamburger0.604514
2chicken0.0
3hot dog0.0
4fries0.0
5macaroni0.0
6pizza0.0
7salad0.0
8milk6.97014
9ice cream2.59132

This makes it easy to perform analyses our solution:

filter!(row -> row.quantity > 0.0, solution)
3×2 DataFrame
Rowfoodquantity
String15Float64
1hamburger0.604514
2milk6.97014
3ice cream2.59132

Problem modification

JuMP makes it easy to take an existing model and modify it by adding extra constraints. Let's see what happens if we add a constraint that we can buy at most 6 units of milk or ice cream combined.

dairy_foods = ["milk", "ice cream"]
is_dairy = map(name -> name in dairy_foods, foods.name)
dairy_constraint = @constraint(model, sum(foods[is_dairy, :x]) <= 6)
optimize!(model)
Test.@test !is_solved_and_feasible(model)
Test.@test termination_status(model) == INFEASIBLE
Test.@test primal_status(model) == NO_SOLUTION
solution_summary(model)
* Solver : HiGHS

* Status
  Result count       : 1
  Termination status : INFEASIBLE
  Message from the solver:
  "kHighsModelStatusInfeasible"

* Candidate solution (result #1)
  Primal status      : NO_SOLUTION
  Dual status        : INFEASIBILITY_CERTIFICATE
  Objective value    : 1.18289e+01
  Objective bound    : 0.00000e+00
  Relative gap       : Inf
  Dual objective value : 3.56146e+00

* Work counters
  Solve time (sec)   : 1.43528e-04
  Simplex iterations : 0
  Barrier iterations : 0
  Node count         : -1

There exists no feasible solution to our problem. Looks like we're stuck eating ice cream for dinner.

Next steps

  • You can delete a constraint using delete(model, dairy_constraint). Can you add a different constraint to provide a diet with less dairy?
  • Some food items (like hamburgers) are discrete. You can use set_integer to force a variable to take integer values. What happens to the solution if you do?