# Getting started with sets and indexing

Most introductory courses to linear programming will teach you to identify sets over which the decision variables and constraints are indexed. Therefore, it is common to write variables such as $x_i$ for all $i \in I$.

A common stumbling block for new users to JuMP is that JuMP does not provide specialized syntax for constructing and manipulating these sets.

We made this decision because Julia already provides a wealth of data structures for working with sets.

In contrast, because tools like AMPL are stand-alone software packages, they had to define their own syntax for set construction and manipulation. Indeed, the AMPL Book has two entire chapters devoted to sets and indexing (V: Simple Sets and Indexing, and VI: Compound Sets and Indexing).

The purpose of this tutorial is to demonstrate a variety of ways in which you can construct and manipulate sets for optimization models.

If you haven't already, you should first read Getting started with JuMP.

using JuMP

## Unordered sets

Unordered sets are useful to describe non-numeric indices, such as the names of cities or types of products.

The most common way to construct a set is by creating a vector:

animals = ["dog", "cat", "chicken", "cow", "pig"]
model = Model()
@variable(model, x[animals])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, ["dog", "cat", "chicken", "cow", "pig"]
And data, a 5-element Vector{VariableRef}:
x[dog]
x[cat]
x[chicken]
x[cow]
x[pig]

We can also use things like the keys of a dictionary:

weight_of_animals = Dict(
"dog" => 20.0,
"cat" => 5.0,
"chicken" => 2.0,
"cow" => 720.0,
"pig" => 150.0,
)
animal_keys = keys(weight_of_animals)
model = Model()
@variable(model, x[animal_keys])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, ["cow", "chicken", "cat", "pig", "dog"]
And data, a 5-element Vector{VariableRef}:
x[cow]
x[chicken]
x[cat]
x[pig]
x[dog]

A third option is to use Julia's Set object.

animal_set = Set()
for animal in keys(weight_of_animals)
push!(animal_set, animal)
end
animal_set
Set{Any} with 5 elements:
"cow"
"chicken"
"cat"
"pig"
"dog"

The nice thing about Sets is that they automatically remove duplicates:

push!(animal_set, "dog")
animal_set
Set{Any} with 5 elements:
"cow"
"chicken"
"cat"
"pig"
"dog"

Note how dog does not appear twice.

model = Model()
@variable(model, x[animal_set])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, ["cow", "chicken", "cat", "pig", "dog"]
And data, a 5-element Vector{VariableRef}:
x[cow]
x[chicken]
x[cat]
x[pig]
x[dog]

## Sets of numbers

Sets of numbers are useful to describe sets that are ordered, such as years or elements in a vector. The easiest way to create sets of numbers is to use Julia's range syntax.

These can start at 1:

model = Model()
@variable(model, x[1:4])
4-element Vector{VariableRef}:
x[1]
x[2]
x[3]
x[4]

but they don't have to:

model = Model()
@variable(model, x[2012:2021])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, 2012:2021
And data, a 10-element Vector{VariableRef}:
x[2012]
x[2013]
x[2014]
x[2015]
x[2016]
x[2017]
x[2018]
x[2019]
x[2020]
x[2021]

Ranges also have a start:step:stop syntax. So the Olympic years are:

model = Model()
@variable(model, x[1896:4:2020])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, 1896:4:2020
And data, a 32-element Vector{VariableRef}:
x[1896]
x[1900]
x[1904]
x[1908]
x[1912]
x[1916]
x[1920]
x[1924]
x[1928]
x[1932]
⋮
x[1988]
x[1992]
x[1996]
x[2000]
x[2004]
x[2008]
x[2012]
x[2016]
x[2020]

## Sets of other things

An important observation is that you can have any Julia type as the element of a set. It doesn't have to be a String or a Number. For example, you can have tuples:

sources = ["A", "B", "C"]
sinks = ["D", "E"]
S = [(source, sink) for source in sources, sink in sinks]
model = Model()
@variable(model, x[S])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, [("A", "D"), ("B", "D"), ("C", "D"), ("A", "E"), ("B", "E"), ("C", "E")]
And data, a 6-element Vector{VariableRef}:
x[("A", "D")]
x[("B", "D")]
x[("C", "D")]
x[("A", "E")]
x[("B", "E")]
x[("C", "E")]
x[("A", "D")]
$$$x_{("A", "D")}$$$

For multi-dimensional sets, you can use JuMP's syntax for constructing Containers:

model = Model()
@variable(model, x[sources, sinks])
2-dimensional DenseAxisArray{VariableRef,2,...} with index sets:
Dimension 1, ["A", "B", "C"]
Dimension 2, ["D", "E"]
And data, a 3×2 Matrix{VariableRef}:
x[A,D]  x[A,E]
x[B,D]  x[B,E]
x[C,D]  x[C,E]
x["A", "D"]
$$$x_{A,D}$$$
Info

Note how we indexed x["A", "D"] instead of x[("A", "D")] as above.

## Sets to watch out for

JuMP supports any sets which are iterable, that is, the set set supports a for-loop like: [i for i in set]. This causes a few common errors.

First, if T = 3, you may pass the integer T by mistake instead of a range like 1:T:

model = Model()
T = 3
@variable(model, x[T])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, [3]
And data, a 1-element Vector{VariableRef}:
x[3]

This results in a single variable being created, instead of three as desired. Because this is a common error, a warning is printed, advising you to pass a Vector{Int} instead:

@variable(model, x_fixed[[T]])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, [3]
And data, a 1-element Vector{VariableRef}:
x_fixed[3]

Second, because Strings are iterable, passing a "index" as a singleton index is the same as passing ['i', 'n', 'd', 'e', 'x']:

@variable(model, y["index"])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, ['i', 'n', 'd', 'e', 'x']
And data, a 5-element Vector{VariableRef}:
y[i]
y[n]
y[d]
y[e]
y[x]

This time, a warning is not printed, but the work-around is similar, pass a Vector{String} instead:

@variable(model, y_fixed[["index"]])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, ["index"]
And data, a 1-element Vector{VariableRef}:
y_fixed[index]
Tip

As a rule of thumb, if you want an index with one element, avoid confusion by passing [index] instead of index.

## Set operations

Julia has built-in support for set operations such as union, intersect, and setdiff.

Therefore, to create a set of all years in which the summer Olympics were held, we can use:

baseline = 1896:4:2020
cancelled = [1916, 1940, 1944, 2020]
off_year = [2021]
olympic_years = union(setdiff(baseline, cancelled), off_year)
29-element Vector{Int64}:
1896
1900
1904
1908
1912
1920
1924
1928
1932
1936
⋮
1988
1992
1996
2000
2004
2008
2012
2016
2021

You can also find the number of elements (i.e., the cardinality) in a set using length:

length(olympic_years)
29

## Set membership operations

To compute membership of sets, use the in function.

2000 in olympic_years
true
2001 in olympic_years
false

## Indexing expressions

Use Julia's generator syntax to compute new sets, such as the list of Olympic years that are divisible by 3:

olympic_3_years = [year for year in olympic_years if mod(year, 3) == 0]
model = Model()
@variable(model, x[olympic_3_years])
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, [1896, 1908, 1920, 1932, 1956, 1968, 1980, 1992, 2004, 2016]
And data, a 10-element Vector{VariableRef}:
x[1896]
x[1908]
x[1920]
x[1932]
x[1956]
x[1968]
x[1980]
x[1992]
x[2004]
x[2016]

Alternatively, use JuMP's syntax for constructing Containers:

model = Model()
@variable(model, x[year in olympic_years; mod(year, 3) == 0])
JuMP.Containers.SparseAxisArray{VariableRef, 1, Tuple{Int64}} with 10 entries:
[1896]  =  x[1896]
[1908]  =  x[1908]
[1920]  =  x[1920]
[1932]  =  x[1932]
[1956]  =  x[1956]
[1968]  =  x[1968]
[1980]  =  x[1980]
[1992]  =  x[1992]
⋮
[2004]  =  x[2004]
[2016]  =  x[2016]

## Compound sets

Consider a transportation problem in which we need to ship goods between cities. We have been provided a list of cities:

cities = ["Auckland", "Wellington", "Christchurch", "Dunedin"]
4-element Vector{String}:
"Auckland"
"Wellington"
"Christchurch"
"Dunedin"

and a distance matrix which records the shipping distance between pairs of cities. If we can't ship between two cities, the distance is 0.

distances = [0 643 1071 1426; 0 0 436 790; 0 0 0 360; 1426 0 0 0]
4×4 Matrix{Int64}:
0  643  1071  1426
0    0   436   790
0    0     0   360
1426    0     0     0

Let's have a look at ways we could write a model with an objective function to minimize the total shipping cost. For simplicity, we'll ignore all constraints.

### Fix unused variables

One approach is to fix all variables that we can't use to zero. Most solvers are smart-enough to remove these during a presolve phase, so it has a very small impact on performance:

N = length(cities)
model = Model()
@variable(model, x[1:N, 1:N] >= 0)
for i in 1:N, j in 1:N
if distances[i, j] == 0
fix(x[i, j], 0.0; force = true)
end
end
@objective(model, Min, sum(distances[i, j] * x[i, j] for i in 1:N, j in 1:N))
$$$643 x_{1,2} + 1071 x_{1,3} + 1426 x_{1,4} + 436 x_{2,3} + 790 x_{2,4} + 360 x_{3,4} + 1426 x_{4,1}$$$

### Filtered summation

Another approach is to define filters whenever we want to sum over our decision variables:

N = length(cities)
model = Model()
@variable(model, x[1:N, 1:N] >= 0)
@objective(
model,
Min,
sum(
distances[i, j] * x[i, j] for i in 1:N, j in 1:N if distances[i, j] > 0
),
)
$$$643 x_{1,2} + 1071 x_{1,3} + 1426 x_{1,4} + 436 x_{2,3} + 790 x_{2,4} + 360 x_{3,4} + 1426 x_{4,1}$$$

### Filtered indexing

We could also use JuMP's support for Containers:

N = length(cities)
model = Model()
@variable(model, x[i = 1:N, j = 1:N; distances[i, j] > 0])
@objective(model, Min, sum(distances[i...] * x[i] for i in eachindex(x)))
$$$790 x_{2,4} + 643 x_{1,2} + 1071 x_{1,3} + 1426 x_{4,1} + 360 x_{3,4} + 1426 x_{1,4} + 436 x_{2,3}$$$
Note

The i... is called a "splat". It converts a tuple like (1, 2) into two indices like distances[1, 2].

### Converting to a different data structure

Another approach, and one that is often the most readable, is to convert the data you have into something that is easier to work with. Originally, we had a vector of strings and a matrix of distances. What we really need is something that maps usable origin-destination pairs to distances. A dictionary is an obvious choice:

routes = Dict(
(a, b) => distances[i, j] for
(i, a) in enumerate(cities), (j, b) in enumerate(cities) if
distances[i, j] > 0
)
Dict{Tuple{String, String}, Int64} with 7 entries:
("Auckland", "Wellington")     => 643
("Wellington", "Christchurch") => 436
("Wellington", "Dunedin")      => 790
("Christchurch", "Dunedin")    => 360
("Auckland", "Dunedin")        => 1426
("Dunedin", "Auckland")        => 1426
("Auckland", "Christchurch")   => 1071

Then, we can create our model like so:

model = Model()
@variable(model, x[keys(routes)])
@objective(model, Min, sum(v * x[k] for (k, v) in routes))
$$$643 x_{("Auckland", "Wellington")} + 436 x_{("Wellington", "Christchurch")} + 790 x_{("Wellington", "Dunedin")} + 360 x_{("Christchurch", "Dunedin")} + 1426 x_{("Auckland", "Dunedin")} + 1426 x_{("Dunedin", "Auckland")} + 1071 x_{("Auckland", "Christchurch")}$$$

This has a number of benefits over the other approaches, including a compacter algebraic model and variables that are named in a more meaningful way.

Tip

If you're struggling to formulate a problem using the available syntax in JuMP, it's probably a sign that you should convert your data into a different form.

## Next steps

The purpose of this tutorial was to show how JuMP does not have specialized syntax for set creation and manipulation. Instead, you should use the tools provided by Julia itself.

This is both an opportunity and a challenge, because you are free to pick the syntax and data structures that best suit your problem, but for new users it can be daunting to decide which structure to use.

Read through some of the other JuMP tutorials to get inspiration and ideas for how you can use Julia's syntax and data structures to your advantage.

Tip