Getting started with sets and indexing
This tutorial was generated using Literate.jl. Download the source as a .jl
file.
Most introductory courses to linear programming will teach you to identify sets over which the decision variables and constraints are indexed. Therefore, it is common to write variables such as $x_i$ for all $i \in I$.
A common stumbling block for new users to JuMP is that JuMP does not provide specialized syntax for constructing and manipulating these sets.
We made this decision because Julia already provides a wealth of data structures for working with sets.
In contrast, because tools like AMPL are stand-alone software packages, they had to define their own syntax for set construction and manipulation. Indeed, the AMPL Book has two entire chapters devoted to sets and indexing (Chapter 5, "Simple Sets and Indexing," and Chapter 6, "Compound Sets and Indexing").
The purpose of this tutorial is to demonstrate a variety of ways in which you can construct and manipulate sets for optimization models.
If you haven't already, you should first read Getting started with JuMP.
using JuMP
Unordered sets
Unordered sets are useful to describe non-numeric indices, such as the names of cities or types of products.
The most common way to construct a set is by creating a vector:
animals = ["dog", "cat", "chicken", "cow", "pig"]
model = Model()
@variable(model, x[animals])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, ["dog", "cat", "chicken", "cow", "pig"]
And data, a 5-element Vector{VariableRef}:
x[dog]
x[cat]
x[chicken]
x[cow]
x[pig]
We can also use things like the keys
of a dictionary:
weight_of_animals = Dict(
"dog" => 20.0,
"cat" => 5.0,
"chicken" => 2.0,
"cow" => 720.0,
"pig" => 150.0,
)
animal_keys = keys(weight_of_animals)
model = Model()
@variable(model, x[animal_keys])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, ["cow", "chicken", "cat", "pig", "dog"]
And data, a 5-element Vector{VariableRef}:
x[cow]
x[chicken]
x[cat]
x[pig]
x[dog]
A third option is to use Julia's Set
object.
animal_set = Set()
for animal in keys(weight_of_animals)
push!(animal_set, animal)
end
animal_set
Set{Any} with 5 elements:
"cow"
"chicken"
"cat"
"pig"
"dog"
The nice thing about Set
s is that they automatically remove duplicates:
push!(animal_set, "dog")
animal_set
Set{Any} with 5 elements:
"cow"
"chicken"
"cat"
"pig"
"dog"
Note how dog
does not appear twice.
model = Model()
@variable(model, x[animal_set])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, ["cow", "chicken", "cat", "pig", "dog"]
And data, a 5-element Vector{VariableRef}:
x[cow]
x[chicken]
x[cat]
x[pig]
x[dog]
Sets of numbers
Sets of numbers are useful to describe sets that are ordered, such as years or elements in a vector. The easiest way to create sets of numbers is to use Julia's range
syntax.
These can start at 1
:
model = Model()
@variable(model, x[1:4])
4-element Vector{VariableRef}:
x[1]
x[2]
x[3]
x[4]
but they don't have to:
model = Model()
@variable(model, x[2012:2021])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, 2012:2021
And data, a 10-element Vector{VariableRef}:
x[2012]
x[2013]
x[2014]
x[2015]
x[2016]
x[2017]
x[2018]
x[2019]
x[2020]
x[2021]
Ranges also have a start:step:stop
syntax. So the Olympic years are:
model = Model()
@variable(model, x[1896:4:2020])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, 1896:4:2020
And data, a 32-element Vector{VariableRef}:
x[1896]
x[1900]
x[1904]
x[1908]
x[1912]
x[1916]
x[1920]
x[1924]
x[1928]
x[1932]
⋮
x[1988]
x[1992]
x[1996]
x[2000]
x[2004]
x[2008]
x[2012]
x[2016]
x[2020]
Sets of other things
An important observation is that you can have any Julia type as the element of a set. It doesn't have to be a String
or a Number
. For example, you can have tuples:
sources = ["A", "B", "C"]
sinks = ["D", "E"]
S = [(source, sink) for source in sources, sink in sinks]
model = Model()
@variable(model, x[S])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, [("A", "D"), ("B", "D"), ("C", "D"), ("A", "E"), ("B", "E"), ("C", "E")]
And data, a 6-element Vector{VariableRef}:
x[("A", "D")]
x[("B", "D")]
x[("C", "D")]
x[("A", "E")]
x[("B", "E")]
x[("C", "E")]
x[("A", "D")]
\[ x_{("A", "D")} \]
For multi-dimensional sets, you can use JuMP's syntax for constructing Containers:
model = Model()
@variable(model, x[sources, sinks])
2-dimensional DenseAxisArray{VariableRef,2,...} with index sets:
Dimension 1, ["A", "B", "C"]
Dimension 2, ["D", "E"]
And data, a 3×2 Matrix{VariableRef}:
x[A,D] x[A,E]
x[B,D] x[B,E]
x[C,D] x[C,E]
x["A", "D"]
\[ x_{A,D} \]
Note how we indexed x["A", "D"]
instead of x[("A", "D")]
as above.
Sets to watch out for
JuMP supports any sets which are iterable, that is, the set set
supports a for-loop like: [i for i in set]
. This causes a few common errors.
First, if T = 3
, you may pass the integer T
by mistake instead of a range like 1:T
:
model = Model()
T = 3
@variable(model, x[T])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, [3]
And data, a 1-element Vector{VariableRef}:
x[3]
This results in a single variable being created, instead of three as desired. Because this is a common error, a warning is printed, advising you to pass a Vector{Int}
instead:
@variable(model, x_fixed[[T]])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, [3]
And data, a 1-element Vector{VariableRef}:
x_fixed[3]
Second, because String
s are iterable, passing a "index"
as a singleton index is the same as passing ['i', 'n', 'd', 'e', 'x']
:
@variable(model, y["index"])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, ['i', 'n', 'd', 'e', 'x']
And data, a 5-element Vector{VariableRef}:
y[i]
y[n]
y[d]
y[e]
y[x]
This time, a warning is not printed, but the work-around is similar, pass a Vector{String}
instead:
@variable(model, y_fixed[["index"]])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, ["index"]
And data, a 1-element Vector{VariableRef}:
y_fixed[index]
As a rule of thumb, if you want an index with one element, avoid confusion by passing [index]
instead of index
.
Set operations
Julia has built-in support for set operations such as union
, intersect
, and setdiff
.
Therefore, to create a set of all years in which the summer Olympics were held, we can use:
baseline = 1896:4:2020
cancelled = [1916, 1940, 1944, 2020]
off_year = [2021]
olympic_years = union(setdiff(baseline, cancelled), off_year)
29-element Vector{Int64}:
1896
1900
1904
1908
1912
1920
1924
1928
1932
1936
⋮
1988
1992
1996
2000
2004
2008
2012
2016
2021
You can also find the number of elements (that is, the cardinality) in a set using length
:
length(olympic_years)
29
Set membership operations
To compute membership of sets, use the in
function.
2000 in olympic_years
true
2001 in olympic_years
false
Indexing expressions
Use Julia's generator syntax to compute new sets, such as the list of Olympic years that are divisible by 3:
olympic_3_years = [year for year in olympic_years if mod(year, 3) == 0]
model = Model()
@variable(model, x[olympic_3_years])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, [1896, 1908, 1920, 1932, 1956, 1968, 1980, 1992, 2004, 2016]
And data, a 10-element Vector{VariableRef}:
x[1896]
x[1908]
x[1920]
x[1932]
x[1956]
x[1968]
x[1980]
x[1992]
x[2004]
x[2016]
Alternatively, use JuMP's syntax for constructing Containers:
model = Model()
@variable(model, x[year in olympic_years; mod(year, 3) == 0])
SparseAxisArray{VariableRef, 1, Tuple{Int64}} with 10 entries:
[1896] = x[1896]
[1908] = x[1908]
[1920] = x[1920]
[1932] = x[1932]
[1956] = x[1956]
[1968] = x[1968]
[1980] = x[1980]
[1992] = x[1992]
⋮
[2004] = x[2004]
[2016] = x[2016]
Compound sets
Consider a transportation problem in which we need to ship goods between cities. We have been provided a list of cities:
cities = ["Auckland", "Wellington", "Christchurch", "Dunedin"]
4-element Vector{String}:
"Auckland"
"Wellington"
"Christchurch"
"Dunedin"
and a distance matrix which records the shipping distance between pairs of cities. If we can't ship between two cities, the distance is 0
.
distances = [0 643 1071 1426; 0 0 436 790; 0 0 0 360; 1426 0 0 0]
4×4 Matrix{Int64}:
0 643 1071 1426
0 0 436 790
0 0 0 360
1426 0 0 0
Let's have a look at ways we could write a model with an objective function to minimize the total shipping cost. For simplicity, we'll ignore all constraints.
Fix unused variables
One approach is to fix all variables that we can't use to zero. Most solvers are smart-enough to remove these during a presolve phase, so it has a very small impact on performance:
N = length(cities)
model = Model()
@variable(model, x[1:N, 1:N] >= 0)
for i in 1:N, j in 1:N
if distances[i, j] == 0
fix(x[i, j], 0.0; force = true)
end
end
@objective(model, Min, sum(distances[i, j] * x[i, j] for i in 1:N, j in 1:N))
\[ 643 x_{1,2} + 1071 x_{1,3} + 1426 x_{1,4} + 436 x_{2,3} + 790 x_{2,4} + 360 x_{3,4} + 1426 x_{4,1} \]
Filtered summation
Another approach is to define filters whenever we want to sum over our decision variables:
N = length(cities)
model = Model()
@variable(model, x[1:N, 1:N] >= 0)
@objective(
model,
Min,
sum(
distances[i, j] * x[i, j] for i in 1:N, j in 1:N if distances[i, j] > 0
),
)
\[ 643 x_{1,2} + 1071 x_{1,3} + 1426 x_{1,4} + 436 x_{2,3} + 790 x_{2,4} + 360 x_{3,4} + 1426 x_{4,1} \]
Filtered indexing
We could also use JuMP's support for Containers:
N = length(cities)
model = Model()
@variable(model, x[i = 1:N, j = 1:N; distances[i, j] > 0])
@objective(model, Min, sum(distances[i...] * x[i] for i in eachindex(x)))
\[ 643 x_{1,2} + 1071 x_{1,3} + 1426 x_{1,4} + 436 x_{2,3} + 790 x_{2,4} + 360 x_{3,4} + 1426 x_{4,1} \]
The i...
is called a "splat." It converts a tuple like (1, 2)
into two indices like distances[1, 2]
.
Converting to a different data structure
Another approach, and one that is often the most readable, is to convert the data you have into something that is easier to work with. Originally, we had a vector of strings and a matrix of distances. What we really need is something that maps usable origin-destination pairs to distances. A dictionary is an obvious choice:
routes = Dict(
(a, b) => distances[i, j] for
(i, a) in enumerate(cities), (j, b) in enumerate(cities) if
distances[i, j] > 0
)
Dict{Tuple{String, String}, Int64} with 7 entries:
("Auckland", "Wellington") => 643
("Wellington", "Christchurch") => 436
("Wellington", "Dunedin") => 790
("Christchurch", "Dunedin") => 360
("Auckland", "Dunedin") => 1426
("Dunedin", "Auckland") => 1426
("Auckland", "Christchurch") => 1071
Then, we can create our model like so:
model = Model()
@variable(model, x[keys(routes)])
@objective(model, Min, sum(v * x[k] for (k, v) in routes))
\[ 643 x_{("Auckland", "Wellington")} + 436 x_{("Wellington", "Christchurch")} + 790 x_{("Wellington", "Dunedin")} + 360 x_{("Christchurch", "Dunedin")} + 1426 x_{("Auckland", "Dunedin")} + 1426 x_{("Dunedin", "Auckland")} + 1071 x_{("Auckland", "Christchurch")} \]
This has a number of benefits over the other approaches, including a compacter algebraic model and variables that are named in a more meaningful way.
If you're struggling to formulate a problem using the available syntax in JuMP, it's probably a sign that you should convert your data into a different form.
Next steps
The purpose of this tutorial was to show how JuMP does not have specialized syntax for set creation and manipulation. Instead, you should use the tools provided by Julia itself.
This is both an opportunity and a challenge, because you are free to pick the syntax and data structures that best suit your problem, but for new users it can be daunting to decide which structure to use.
Read through some of the other JuMP tutorials to get inspiration and ideas for how you can use Julia's syntax and data structures to your advantage.