# The factory schedule example

This tutorial was generated using Literate.jl. Download the source as a .jl file.

This tutorial was originally contributed by @Crghilardi.

This tutorial is a Julia translation of Part 5 from Introduction to Linear Programming with Python.

The purpose of this tutorial is to demonstrate how to use DataFrames and delimited files, and to structure your code that is robust to infeasibilities and permits running with different datasets.

## Required packages

This tutorial requires the following packages:

using JuMP
import CSV
import DataFrames
import HiGHS
import StatsPlots

## Formulation

The Factory Scheduling Problem assumes we are optimizing the production of a good from factories $f \in F$ over the course of 12 months $m \in M$.

If a factory $f$ runs during a month $m$, a fixed cost of $a_f$ is incurred, the factory must produce $x_{m,f}$ units that is within some minimum and maximum production levels $l_f$ and $u_f$ respectively, and each unit of production incurs a variable cost $c_f$. Otherwise, the factory can be shut for the month with zero production and no fixed-cost is incurred. We denote the run/not-run decision by $z_{m,f} \in \{0, 1\}$, where $z_{m,f}$ is $1$ if factory $f$ runs in month $m$. The factory must produce enough units to satisfy demand $d_m$.

With a little effort, we can formulate our problem as the following linear program:

\begin{aligned} \min & \sum\limits_{f \in F, m \in M} a_f z_{m,f} + c_f x_{m,f} \\ \text{s.t.} & x_{m,f} \le u_f z_{m,f} && \forall f \in F, m \in M \\ & x_{m,f} \ge l_f z_{m,f} && \forall f \in F, m \in M \\ & \sum\limits_{f\in F} x_{m,f} = d_m && \forall f \in F, m \in M \\ & z_{m,f} \in \{0, 1\} && \forall f \in F, m \in M. \end{aligned}

However, this formulation has a problem: if demand is too high, we may be unable to satisfy the demand constraint, and the problem will be infeasible.

Tip

When modeling, consider ways to formulate your model such that it always has a feasible solution. This greatly simplifies debugging data errors that would otherwise result in an infeasible solution. In practice, most practical decisions have a feasible solution. In our case, we could satisfy demand (at a high cost) by buying replacement items for the buyer, or running the factories in overtime to make up the difference.

We can improve our model by adding a new variable, $\delta_m$, which represents the quantity of unmet demand in each month $m$. We penalize $\delta_m$ by an arbitrarily large value of 10,000/unit in the objective. \begin{aligned} \min & \sum\limits_{f \in F, m \in M} a_f z_{m,f} + c_f x_{m,f} + \sum\limits_{m \in M}10000 \delta_m \\ \text{s.t.} & x_{m,f} \le u_f z_{m,f} && \forall f \in F, m \in M \\ & x_{m,f} \ge l_f z_{m,f} && \forall f \in F, m \in M \\ & \sum\limits_{f\in F} x_{m,f} - \delta_m = d_m && \forall f \in F, m \in M \\ & z_{m,f} \in \{0, 1\} && \forall f \in F, m \in M \\ & \delta_m \ge 0 && \forall m \in M. \end{aligned} ## Data The JuMP GitHub repository contains two text files with the data we need for this tutorial. The first file contains a dataset of our factories, A and B, with their production and cost levels for each month. For the documentation, the file is located at: factories_filename = joinpath(@__DIR__, "factory_schedule_factories.txt"); To run locally, download factory_schedule_factories.txt and update factories_filename appropriately. The file has the following contents: print(read(factories_filename, String)) factory month min_production max_production fixed_cost variable_cost A 1 20000 100000 500 10 A 2 20000 110000 500 11 A 3 20000 120000 500 12 A 4 20000 145000 500 9 A 5 20000 160000 500 8 A 6 20000 140000 500 8 A 7 20000 155000 500 5 A 8 20000 200000 500 7 A 9 20000 210000 500 9 A 10 20000 197000 500 10 A 11 20000 80000 500 8 A 12 20000 150000 500 8 B 1 20000 50000 600 5 B 2 20000 55000 600 4 B 3 20000 60000 600 3 B 4 20000 100000 600 5 B 5 0 0 0 0 B 6 20000 70000 600 6 B 7 20000 60000 600 4 B 8 20000 100000 600 6 B 9 20000 100000 600 8 B 10 20000 100000 600 11 B 11 20000 120000 600 10 B 12 20000 150000 600 12 We use the CSV and DataFrames packages to read it into Julia: factory_df = CSV.read( factories_filename, DataFrames.DataFrame; delim = ' ', ignorerepeated = true, ) 24×6 DataFrame Rowfactorymonthmin_productionmax_productionfixed_costvariable_cost String1Int64Int64Int64Int64Int64 1A12000010000050010 2A22000011000050011 3A32000012000050012 4A4200001450005009 5A5200001600005008 6A6200001400005008 7A7200001550005005 8A8200002000005007 9A9200002100005009 10A102000019700050010 11A1120000800005008 12A12200001500005008 13B120000500006005 14B220000550006004 15B320000600006003 16B4200001000006005 17B50000 18B620000700006006 19B720000600006004 20B8200001000006006 21B9200001000006008 22B102000010000060011 23B112000012000060010 24B122000015000060012 The second file contains the demand data by month: demand_filename = joinpath(@__DIR__, "factory_schedule_demand.txt"); To run locally, download factory_schedule_demand.txt and update demand_filename appropriately. demand_df = CSV.read( demand_filename, DataFrames.DataFrame; delim = ' ', ignorerepeated = true, ) 12×2 DataFrame Rowmonthdemand Int64Int64 11120000 22100000 33130000 44130000 55140000 66130000 77150000 88170000 99200000 1010190000 1111140000 1212100000 ### Data validation Before moving on, it's always good practice to validate the data you read from external sources. The more effort you spend here, the fewer issues you will have later. The following function contains a few simple checks, but we could add more. For example, you might want to check that none of the values are too large (or too small), which might indicate a typo or a unit conversion issue (perhaps the variable costs are in/1000 units instead of \$/unit).

function valiate_data(
demand_df::DataFrames.DataFrame,
factory_df::DataFrames.DataFrame,
)
# Minimum production must not exceed maximum production.
@assert all(factory_df.min_production .<= factory_df.max_production)
# Demand, minimum production, fixed costs, and variable costs must all be
# non-negative.
@assert all(demand_df.demand .>= 0)
@assert all(factory_df.min_production .>= 0)
@assert all(factory_df.fixed_cost .>= 0)
@assert all(factory_df.variable_cost .>= 0)
return
end

valiate_data(demand_df, factory_df)

## JuMP formulation

Next, we need to code our JuMP formulation. As shown in Design patterns for larger models, it's always good practice to code your model in a function that accepts well-defined input and returns well-defined output.

function solve_factory_scheduling(
demand_df::DataFrames.DataFrame,
factory_df::DataFrames.DataFrame,
)
# Even though we validated the data above, it's good practice to do it here
# too.
valiate_data(demand_df, factory_df)
months, factories = unique(factory_df.month), unique(factory_df.factory)
model = Model(HiGHS.Optimizer)
set_silent(model)
@variable(model, status[months, factories], Bin)
@variable(model, production[months, factories], Int)
@variable(model, unmet_demand[months] >= 0)
# We use eachrow to loop through the rows of the dataframe and add the
# relevant constraints.
for r in eachrow(factory_df)
m, f = r.month, r.factory
@constraint(model, production[m, f] <= r.max_production * status[m, f])
@constraint(model, production[m, f] >= r.min_production * status[m, f])
end
@constraint(
model,
[r in eachrow(demand_df)],
sum(production[r.month, :]) + unmet_demand[r.month] == r.demand,
)
@objective(
model,
Min,
10_000 * sum(unmet_demand) + sum(
r.fixed_cost * status[r.month, r.factory] +
r.variable_cost * production[r.month, r.factory] for
r in eachrow(factory_df)
)
)
optimize!(model)
@assert is_solved_and_feasible(model)
schedules = Dict{Symbol,Vector{Float64}}(
Symbol(f) => value.(production[:, f]) for f in factories
)
schedules[:unmet_demand] = value.(unmet_demand)
return (
termination_status = termination_status(model),
cost = objective_value(model),
# This select statement re-orders the columns in the DataFrame.
schedules = DataFrames.select(
DataFrames.DataFrame(schedules),
[:unmet_demand, :A, :B],
),
)
end
solve_factory_scheduling (generic function with 1 method)

## Solution

Now we can call our solve_factory_scheduling function using the data we read in above.

solution = solve_factory_scheduling(demand_df, factory_df);

Let's see what solution contains:

solution.termination_status
OPTIMAL::TerminationStatusCode = 1
solution.cost
1.29064e7
solution.schedules
12×3 DataFrame
Rowunmet_demandAB
Float64Float64Float64
10.070000.050000.0
20.045000.055000.0
30.070000.060000.0
40.030000.0100000.0
50.0140000.0-0.0
60.060000.070000.0
70.090000.060000.0
80.070000.0100000.0
90.0100000.0100000.0
100.0190000.0-0.0
110.080000.060000.0
120.0100000.0-0.0

These schedules will be easier to visualize as a graph:

StatsPlots.groupedbar(
Matrix(solution.schedules);
bar_position = :stack,
labels = ["unmet demand" "A" "B"],
xlabel = "Month",
ylabel = "Production",
legend = :topleft,
color = ["#20326c" "#4063d8" "#a0b1ec"],
)

Note that we don't have any unmet demand.

## What happens if demand increases?

Let's run an experiment by increasing the demand by 50% in all time periods:

demand_df.demand .*= 1.5
12-element Vector{Float64}:
180000.0
150000.0
195000.0
195000.0
210000.0
195000.0
225000.0
255000.0
300000.0
285000.0
210000.0
150000.0

Now we resolve the problem:

high_demand_solution = solve_factory_scheduling(demand_df, factory_df);

and visualize the solution:

StatsPlots.groupedbar(
Matrix(high_demand_solution.schedules);
bar_position = :stack,
labels = ["unmet demand" "A" "B"],
xlabel = "Month",
ylabel = "Production",
legend = :topleft,
color = ["#20326c" "#4063d8" "#a0b1ec"],
)

Uh oh, we can't satisfy all of the demand.

## How sensitive is the solution to changes in variable cost?

Let's run another experiment, this time seeing how the optimal objective value changes as we vary the variable costs of each factory.

First though, let's reset the demand to it's original level:

demand_df.demand ./= 1.5;

For our experiment, we're going to scale the variable costs of both factories by a set of values from 0.0 to 1.5:

scale_factors = 0:0.1:1.5
0.0:0.1:1.5

At a high level, we're going to loop over the scale factors for A, then the scale factors for B, rescale the input data, call our solve_factory_scheduling example, and then store the optimal objective value in the following cost matrix:

cost = zeros(length(scale_factors), length(scale_factors));

Because we're modifying factory_df in-place, we need to store the original variable costs in a new column:

factory_df[!, :old_variable_cost] = copy(factory_df.variable_cost);

Then, we need a function to scale the :variable_cost column for a particular factory by a value scale:

function scale_variable_cost(df, factory, scale)
rows = df.factory .== factory
df[rows, :variable_cost] .=
round.(Int, df[rows, :old_variable_cost] .* scale)
return
end
scale_variable_cost (generic function with 1 method)

Our experiment is just a nested for-loop, modifying A and B and storing the cost:

for (j, a) in enumerate(scale_factors)
scale_variable_cost(factory_df, "A", a)
for (i, b) in enumerate(scale_factors)
scale_variable_cost(factory_df, "B", b)
cost[i, j] = solve_factory_scheduling(demand_df, factory_df).cost
end
end

Let's visualize the cost matrix:

StatsPlots.contour(
scale_factors,
scale_factors,
cost;
xlabel = "Scale of factory A",
ylabel = "Scale of factory B",
)

What can you infer from the solution?

Info

The Power Systems tutorial explains a number of other ways you can structure a problem to perform a parametric analysis of the solution. In particular, you can use in-place modification to reduce the time it takes to build and solve the resulting models.