The diet problem
This tutorial was generated using Literate.jl. Download the source as a .jl
file.
The purpose of this tutorial is to demonstrate how to incorporate DataFrames into a JuMP model. As an example, we use classic Stigler diet problem.
Required packages
This tutorial requires the following packages:
using JuMP
import CSV
import DataFrames
import HiGHS
import Test
Formulation
We wish to cook a nutritionally balanced meal by choosing the quantity of each food $f$ to eat from a set of foods $F$ in our kitchen.
Each food $f$ has a cost, $c_f$, as well as a macro-nutrient profile $a_{m,f}$ for each macro-nutrient $m \in M$.
Because we care about a nutritionally balanced meal, we set some minimum and maximum limits for each nutrient, which we denote $l_m$ and $u_m$ respectively.
Furthermore, because we are optimizers, we seek the minimum cost solution.
With a little effort, we can formulate our dinner problem as the following linear program:
\[\begin{aligned} \min & \sum\limits_{f \in F} c_f x_f \\ \text{s.t.}\ \ & l_m \le \sum\limits_{f \in F} a_{m,f} x_f \le u_m, && \forall m \in M \\ & x_f \ge 0, && \forall f \in F. \end{aligned}\]
In the rest of this tutorial, we will create and solve this problem in JuMP, and learn what we should cook for dinner.
Data
First, we need some data for the problem. For this tutorial, we'll write CSV files to a temporary directory from Julia. If you have existing files, you could change the filenames to point to them instead.
dir = mktempdir()
"/tmp/jl_e267Im"
The first file is a list of foods with their macro-nutrient profile:
food_csv_filename = joinpath(dir, "diet_foods.csv")
open(food_csv_filename, "w") do io
write(
io,
"""
name,cost,calories,protein,fat,sodium
hamburger,2.49,410,24,26,730
chicken,2.89,420,32,10,1190
hot dog,1.50,560,20,32,1800
fries,1.89,380,4,19,270
macaroni,2.09,320,12,10,930
pizza,1.99,320,15,12,820
salad,2.49,320,31,12,1230
milk,0.89,100,8,2.5,125
ice cream,1.59,330,8,10,180
""",
)
return
end
foods = CSV.read(food_csv_filename, DataFrames.DataFrame)
Row | name | cost | calories | protein | fat | sodium |
---|---|---|---|---|---|---|
String15 | Float64 | Int64 | Int64 | Float64 | Int64 | |
1 | hamburger | 2.49 | 410 | 24 | 26.0 | 730 |
2 | chicken | 2.89 | 420 | 32 | 10.0 | 1190 |
3 | hot dog | 1.5 | 560 | 20 | 32.0 | 1800 |
4 | fries | 1.89 | 380 | 4 | 19.0 | 270 |
5 | macaroni | 2.09 | 320 | 12 | 10.0 | 930 |
6 | pizza | 1.99 | 320 | 15 | 12.0 | 820 |
7 | salad | 2.49 | 320 | 31 | 12.0 | 1230 |
8 | milk | 0.89 | 100 | 8 | 2.5 | 125 |
9 | ice cream | 1.59 | 330 | 8 | 10.0 | 180 |
Here, $F$ is foods.name
and $c_f$ is foods.cost
. (We're also playing a bit loose the term "macro-nutrient" by including calories and sodium.)
We also need our minimum and maximum limits:
nutrient_csv_filename = joinpath(dir, "diet_nutrient.csv")
open(nutrient_csv_filename, "w") do io
write(
io,
"""
nutrient,min,max
calories,1800,2200
protein,91,
fat,0,65
sodium,0,1779
""",
)
return
end
limits = CSV.read(nutrient_csv_filename, DataFrames.DataFrame)
Row | nutrient | min | max |
---|---|---|---|
String15 | Int64 | Int64? | |
1 | calories | 1800 | 2200 |
2 | protein | 91 | missing |
3 | fat | 0 | 65 |
4 | sodium | 0 | 1779 |
Protein is missing data for the maximum. Let's fix that using coalesce
:
limits.max = coalesce.(limits.max, Inf)
limits
Row | nutrient | min | max |
---|---|---|---|
String15 | Int64 | Real | |
1 | calories | 1800 | 2200 |
2 | protein | 91 | Inf |
3 | fat | 0 | 65 |
4 | sodium | 0 | 1779 |
JuMP formulation
Now we're ready to convert our mathematical formulation into a JuMP model.
First, create a new JuMP model. Since we have a linear program, we'll use HiGHS as our optimizer:
model = Model(HiGHS.Optimizer)
set_silent(model)
Next, we create a set of decision variables x
, with one element for each row in the DataFrame, and each x
has a lower bound of 0
:
@variable(model, x[foods.name] >= 0)
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, InlineStrings.String15["hamburger", "chicken", "hot dog", "fries", "macaroni", "pizza", "salad", "milk", "ice cream"]
And data, a 9-element Vector{VariableRef}:
x[hamburger]
x[chicken]
x[hot dog]
x[fries]
x[macaroni]
x[pizza]
x[salad]
x[milk]
x[ice cream]
To simplify things later on, we store the vector as a new column x
in the DataFrame foods
. Since x
is a DenseAxisArray
, we first need to convert it to an Array
:
foods.x = Array(x)
9-element Vector{VariableRef}:
x[hamburger]
x[chicken]
x[hot dog]
x[fries]
x[macaroni]
x[pizza]
x[salad]
x[milk]
x[ice cream]
Our objective is to minimize the total cost of purchasing food:
@objective(model, Min, sum(foods.cost .* foods.x));
For the next component, we need to add a constraint that our total intake of each component is within the limits contained in the limits
DataFrame:
@constraint(
model,
[row in eachrow(limits)],
row.min <= sum(foods[!, row.nutrient] .* foods.x) <= row.max,
);
What does our model look like?
print(model)
Min 2.49 x[hamburger] + 2.89 x[chicken] + 1.5 x[hot dog] + 1.89 x[fries] + 2.09 x[macaroni] + 1.99 x[pizza] + 2.49 x[salad] + 0.89 x[milk] + 1.59 x[ice cream]
Subject to
410 x[hamburger] + 420 x[chicken] + 560 x[hot dog] + 380 x[fries] + 320 x[macaroni] + 320 x[pizza] + 320 x[salad] + 100 x[milk] + 330 x[ice cream] ∈ [1800, 2200]
24 x[hamburger] + 32 x[chicken] + 20 x[hot dog] + 4 x[fries] + 12 x[macaroni] + 15 x[pizza] + 31 x[salad] + 8 x[milk] + 8 x[ice cream] ∈ [91, Inf]
26 x[hamburger] + 10 x[chicken] + 32 x[hot dog] + 19 x[fries] + 10 x[macaroni] + 12 x[pizza] + 12 x[salad] + 2.5 x[milk] + 10 x[ice cream] ∈ [0, 65]
730 x[hamburger] + 1190 x[chicken] + 1800 x[hot dog] + 270 x[fries] + 930 x[macaroni] + 820 x[pizza] + 1230 x[salad] + 125 x[milk] + 180 x[ice cream] ∈ [0, 1779]
x[hamburger] ≥ 0
x[chicken] ≥ 0
x[hot dog] ≥ 0
x[fries] ≥ 0
x[macaroni] ≥ 0
x[pizza] ≥ 0
x[salad] ≥ 0
x[milk] ≥ 0
x[ice cream] ≥ 0
Solution
Let's optimize and take a look at the solution:
optimize!(model)
@assert is_solved_and_feasible(model)
solution_summary(model)
* Solver : HiGHS
* Status
Result count : 1
Termination status : OPTIMAL
Message from the solver:
"kHighsModelStatusOptimal"
* Candidate solution (result #1)
Primal status : FEASIBLE_POINT
Dual status : FEASIBLE_POINT
Objective value : 1.18289e+01
Objective bound : 1.18289e+01
Relative gap : 0.00000e+00
Dual objective value : 1.18289e+01
* Work counters
Solve time (sec) : 2.00987e-04
Simplex iterations : 6
Barrier iterations : 0
Node count : -1
We found an optimal solution. Let's see what the optimal solution is:
for row in eachrow(foods)
println(row.name, " = ", value(row.x))
end
hamburger = 0.6045138888888871
chicken = 0.0
hot dog = 0.0
fries = 0.0
macaroni = 0.0
pizza = 0.0
salad = 0.0
milk = 6.9701388888888935
ice cream = 2.5913194444444447
That's a lot of milk and ice cream, and sadly, we only get 0.6
of a hamburger.
We can also use the function Containers.rowtable
to easily convert the result into a DataFrame:
table = Containers.rowtable(value, x; header = [:food, :quantity])
solution = DataFrames.DataFrame(table)
Row | food | quantity |
---|---|---|
String15 | Float64 | |
1 | hamburger | 0.604514 |
2 | chicken | 0.0 |
3 | hot dog | 0.0 |
4 | fries | 0.0 |
5 | macaroni | 0.0 |
6 | pizza | 0.0 |
7 | salad | 0.0 |
8 | milk | 6.97014 |
9 | ice cream | 2.59132 |
This makes it easy to perform analyses our solution:
filter!(row -> row.quantity > 0.0, solution)
Row | food | quantity |
---|---|---|
String15 | Float64 | |
1 | hamburger | 0.604514 |
2 | milk | 6.97014 |
3 | ice cream | 2.59132 |
Problem modification
JuMP makes it easy to take an existing model and modify it by adding extra constraints. Let's see what happens if we add a constraint that we can buy at most 6 units of milk or ice cream combined.
dairy_foods = ["milk", "ice cream"]
is_dairy = map(name -> name in dairy_foods, foods.name)
dairy_constraint = @constraint(model, sum(foods[is_dairy, :x]) <= 6)
optimize!(model)
Test.@test !is_solved_and_feasible(model)
Test.@test termination_status(model) == INFEASIBLE
Test.@test primal_status(model) == NO_SOLUTION
solution_summary(model)
* Solver : HiGHS
* Status
Result count : 1
Termination status : INFEASIBLE
Message from the solver:
"kHighsModelStatusInfeasible"
* Candidate solution (result #1)
Primal status : NO_SOLUTION
Dual status : INFEASIBILITY_CERTIFICATE
Objective value : 1.18289e+01
Objective bound : 0.00000e+00
Relative gap : Inf
Dual objective value : 3.56146e+00
* Work counters
Solve time (sec) : 1.46389e-04
Simplex iterations : 0
Barrier iterations : 0
Node count : -1
There exists no feasible solution to our problem. Looks like we're stuck eating ice cream for dinner.
Next steps
- You can delete a constraint using
delete(model, dairy_constraint)
. Can you add a different constraint to provide a diet with less dairy? - Some food items (like hamburgers) are discrete. You can use
set_integer
to force a variable to take integer values. What happens to the solution if you do?