The diet problem

This tutorial was generated using Literate.jl. Download the source as a .jl file.

The purpose of this tutorial is to demonstrate how to incorporate DataFrames into a JuMP model. As an example, we use classic Stigler diet problem.

Required packages

This tutorial requires the following packages:

using JuMP
import CSV
import DataFrames
import HiGHS
import Test

Formulation

We wish to cook a nutritionally balanced meal by choosing the quantity of each food $f$ to eat from a set of foods $F$ in our kitchen.

Each food $f$ has a cost, $c_f$, as well as a macro-nutrient profile $a_{m,f}$ for each macro-nutrient $m \in M$.

Because we care about a nutritionally balanced meal, we set some minimum and maximum limits for each nutrient, which we denote $l_m$ and $u_m$ respectively.

Furthermore, because we are optimizers, we seek the minimum cost solution.

With a little effort, we can formulate our dinner problem as the following linear program:

\[\begin{aligned} \min & \sum\limits_{f \in F} c_f x_f \\ \text{s.t.}\ \ & l_m \le \sum\limits_{f \in F} a_{m,f} x_f \le u_m, && \forall m \in M \\ & x_f \ge 0, && \forall f \in F. \end{aligned}\]

In the rest of this tutorial, we will create and solve this problem in JuMP, and learn what we should cook for dinner.

First, we need some data for the problem. For this tutorial, we'll write CSV files to a temporary directory from Julia. If you have existing files, you could change the filenames to point to them instead.

dir = mktempdir()

"/tmp/jl_zJTXvt"

The first file is a list of foods with their macro-nutrient profile:

food_csv_filename = joinpath(dir, "diet_foods.csv")
open(food_csv_filename, "w") do io
    write(
        io,
        """
        name,cost,calories,protein,fat,sodium
        hamburger,2.49,410,24,26,730
        chicken,2.89,420,32,10,1190
        hot dog,1.50,560,20,32,1800
        fries,1.89,380,4,19,270
        macaroni,2.09,320,12,10,930
        pizza,1.99,320,15,12,820
        salad,2.49,320,31,12,1230
        milk,0.89,100,8,2.5,125
        ice cream,1.59,330,8,10,180
        """,
    )
    return
end
foods = CSV.read(food_csv_filename, DataFrames.DataFrame)

9×6 DataFrame

Row	name	cost	calories	protein	fat	sodium
	String15	Float64	Int64	Int64	Float64	Int64
1	hamburger	2.49	410	24	26.0	730
2	chicken	2.89	420	32	10.0	1190
3	hot dog	1.5	560	20	32.0	1800
4	fries	1.89	380	4	19.0	270
5	macaroni	2.09	320	12	10.0	930
6	pizza	1.99	320	15	12.0	820
7	salad	2.49	320	31	12.0	1230
8	milk	0.89	100	8	2.5	125
9	ice cream	1.59	330	8	10.0	180

Here, $F$ is foods.name and $c_f$ is foods.cost. (We're also playing a bit loose the term "macro-nutrient" by including calories and sodium.)

We also need our minimum and maximum limits:

nutrient_csv_filename = joinpath(dir, "diet_nutrient.csv")
open(nutrient_csv_filename, "w") do io
    write(
        io,
        """
        nutrient,min,max
        calories,1800,2200
        protein,91,
        fat,0,65
        sodium,0,1779
        """,
    )
    return
end
limits = CSV.read(nutrient_csv_filename, DataFrames.DataFrame)

4×3 DataFrame

Row	nutrient	min	max
	String15	Int64	Int64?
1	calories	1800	2200
2	protein	91	missing
3	fat	0	65
4	sodium	0	1779

Protein is missing data for the maximum. Let's fix that using coalesce:

limits.max = coalesce.(limits.max, Inf)
limits

4×3 DataFrame

Row	nutrient	min	max
	String15	Int64	Real
1	calories	1800	2200
2	protein	91	Inf
3	fat	0	65
4	sodium	0	1779

JuMP formulation

Now we're ready to convert our mathematical formulation into a JuMP model.

First, create a new JuMP model. Since we have a linear program, we'll use HiGHS as our optimizer:

model = Model(HiGHS.Optimizer)
set_silent(model)

Next, we create a set of decision variables x, with one element for each row in the DataFrame, and each x has a lower bound of 0:

@variable(model, x[foods.name] >= 0)

1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
    Dimension 1, InlineStrings.String15["hamburger", "chicken", "hot dog", "fries", "macaroni", "pizza", "salad", "milk", "ice cream"]
And data, a 9-element Vector{VariableRef}:
 x[hamburger]
 x[chicken]
 x[hot dog]
 x[fries]
 x[macaroni]
 x[pizza]
 x[salad]
 x[milk]
 x[ice cream]

To simplify things later on, we store the vector as a new column x in the DataFrame foods. Since x is a DenseAxisArray, we first need to convert it to an Array:

foods.x = Array(x)

9-element Vector{VariableRef}:
 x[hamburger]
 x[chicken]
 x[hot dog]
 x[fries]
 x[macaroni]
 x[pizza]
 x[salad]
 x[milk]
 x[ice cream]

Our objective is to minimize the total cost of purchasing food:

@objective(model, Min, sum(foods.cost .* foods.x));

For the next component, we need to add a constraint that our total intake of each component is within the limits contained in the limits DataFrame:

@constraint(
    model,
    [row in eachrow(limits)],
    row.min <= sum(foods[!, row.nutrient] .* foods.x) <= row.max,
);

What does our model look like?

print(model)

Min 2.49 x[hamburger] + 2.89 x[chicken] + 1.5 x[hot dog] + 1.89 x[fries] + 2.09 x[macaroni] + 1.99 x[pizza] + 2.49 x[salad] + 0.89 x[milk] + 1.59 x[ice cream]
Subject to
 410 x[hamburger] + 420 x[chicken] + 560 x[hot dog] + 380 x[fries] + 320 x[macaroni] + 320 x[pizza] + 320 x[salad] + 100 x[milk] + 330 x[ice cream] ∈ [1800, 2200]
 24 x[hamburger] + 32 x[chicken] + 20 x[hot dog] + 4 x[fries] + 12 x[macaroni] + 15 x[pizza] + 31 x[salad] + 8 x[milk] + 8 x[ice cream] ∈ [91, Inf]
 26 x[hamburger] + 10 x[chicken] + 32 x[hot dog] + 19 x[fries] + 10 x[macaroni] + 12 x[pizza] + 12 x[salad] + 2.5 x[milk] + 10 x[ice cream] ∈ [0, 65]
 730 x[hamburger] + 1190 x[chicken] + 1800 x[hot dog] + 270 x[fries] + 930 x[macaroni] + 820 x[pizza] + 1230 x[salad] + 125 x[milk] + 180 x[ice cream] ∈ [0, 1779]
 x[hamburger] ≥ 0
 x[chicken] ≥ 0
 x[hot dog] ≥ 0
 x[fries] ≥ 0
 x[macaroni] ≥ 0
 x[pizza] ≥ 0
 x[salad] ≥ 0
 x[milk] ≥ 0
 x[ice cream] ≥ 0

Solution

Let's optimize and take a look at the solution:

optimize!(model)
assert_is_solved_and_feasible(model)
solution_summary(model)

solution_summary(; result = 1, verbose = false)
├ solver_name          : HiGHS
├ Termination
│ ├ termination_status : OPTIMAL
│ ├ result_count       : 1
│ ├ raw_status         : kHighsModelStatusOptimal
│ └ objective_bound    : 1.18289e+01
├ Solution (result = 1)
│ ├ primal_status        : FEASIBLE_POINT
│ ├ dual_status          : FEASIBLE_POINT
│ ├ objective_value      : 1.18289e+01
│ ├ dual_objective_value : 1.18289e+01
│ └ relative_gap         : 0.00000e+00
└ Work counters
  ├ solve_time (sec)   : 2.21968e-04
  ├ simplex_iterations : 6
  ├ barrier_iterations : 0
  └ node_count         : -1

We found an optimal solution. Let's see what the optimal solution is:

for row in eachrow(foods)
    println(row.name, " = ", value(row.x))
end

hamburger = 0.6045138888888871
chicken = 0.0
hot dog = 0.0
fries = 0.0
macaroni = 0.0
pizza = 0.0
salad = 0.0
milk = 6.9701388888888935
ice cream = 2.5913194444444447

That's a lot of milk and ice cream, and sadly, we only get 0.6 of a hamburger.

We can also use the function Containers.rowtable to easily convert the result into a DataFrame:

table = Containers.rowtable(value, x; header = [:food, :quantity])
solution = DataFrames.DataFrame(table)

9×2 DataFrame

Row	food	quantity
	String15	Float64
1	hamburger	0.604514
2	chicken	0.0
3	hot dog	0.0
4	fries	0.0
5	macaroni	0.0
6	pizza	0.0
7	salad	0.0
8	milk	6.97014
9	ice cream	2.59132

This makes it easy to perform analyses our solution:

filter!(row -> row.quantity > 0.0, solution)

3×2 DataFrame

Row	food	quantity
	String15	Float64
1	hamburger	0.604514
2	milk	6.97014
3	ice cream	2.59132

Problem modification

JuMP makes it easy to take an existing model and modify it by adding extra constraints. Let's see what happens if we add a constraint that we can buy at most 6 units of milk or ice cream combined.

dairy_foods = ["milk", "ice cream"]
is_dairy = map(name -> name in dairy_foods, foods.name)
dairy_constraint = @constraint(model, sum(foods[is_dairy, :x]) <= 6)
optimize!(model)
Test.@test !is_solved_and_feasible(model)
Test.@test termination_status(model) == INFEASIBLE
Test.@test primal_status(model) == NO_SOLUTION
solution_summary(model)

solution_summary(; result = 1, verbose = false)
├ solver_name          : HiGHS
├ Termination
│ ├ termination_status : INFEASIBLE
│ ├ result_count       : 1
│ ├ raw_status         : kHighsModelStatusInfeasible
│ └ objective_bound    : 0.00000e+00
├ Solution (result = 1)
│ ├ primal_status        : NO_SOLUTION
│ ├ dual_status          : INFEASIBILITY_CERTIFICATE
│ ├ objective_value      : 1.18289e+01
│ ├ dual_objective_value : 3.56146e+00
│ └ relative_gap         : Inf
└ Work counters
  ├ solve_time (sec)   : 1.55210e-04
  ├ simplex_iterations : 0
  ├ barrier_iterations : 0
  └ node_count         : -1

There exists no feasible solution to our problem. Looks like we're stuck eating ice cream for dinner.

Next steps

You can delete a constraint using delete(model, dairy_constraint). Can you add a different constraint to provide a diet with less dairy?
Some food items (like hamburgers) are discrete. You can use set_integer to force a variable to take integer values. What happens to the solution if you do?