# Getting started with sets and indexing

Most introductory courses to linear programming will teach you to identify sets over which the decision variables and constraints are indexed. Therefore, it is common to write variables such as $x_i$ for all $i \in I$.

A common stumbling block for new users to JuMP is that *JuMP does not provide specialized syntax for constructing and manipulating these sets*.

We made this decision because Julia already provides a wealth of data structures for working with sets.

In contrast, because tools like AMPL are stand-alone software packages, they had to define their own syntax for set construction and manipulation. Indeed, the AMPL Book has two entire chapters devoted to sets and indexing (V: Simple Sets and Indexing, and VI: Compound Sets and Indexing).

The purpose of this tutorial is to demonstrate a variety of ways in which you can construct and manipulate sets for optimization models.

If you haven't already, you should first read Getting started with JuMP.

`using JuMP`

## Unordered sets

Unordered sets are useful to describe non-numeric indices, such as the names of cities or types of products.

The most common way to construct a set is by creating a vector:

```
animals = ["dog", "cat", "chicken", "cow", "pig"]
model = Model()
@variable(model, x[animals])
```

```
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, ["dog", "cat", "chicken", "cow", "pig"]
And data, a 5-element Vector{VariableRef}:
x[dog]
x[cat]
x[chicken]
x[cow]
x[pig]
```

We can also use things like the `keys`

of a dictionary:

```
weight_of_animals = Dict(
"dog" => 20.0,
"cat" => 5.0,
"chicken" => 2.0,
"cow" => 720.0,
"pig" => 150.0,
)
animal_keys = keys(weight_of_animals)
model = Model()
@variable(model, x[animal_keys])
```

```
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, ["cow", "chicken", "cat", "pig", "dog"]
And data, a 5-element Vector{VariableRef}:
x[cow]
x[chicken]
x[cat]
x[pig]
x[dog]
```

A third option is to use Julia's `Set`

object.

```
animal_set = Set()
for animal in keys(weight_of_animals)
push!(animal_set, animal)
end
animal_set
```

```
Set{Any} with 5 elements:
"cow"
"chicken"
"cat"
"pig"
"dog"
```

The nice thing about `Set`

s is that they automatically remove duplicates:

```
push!(animal_set, "dog")
animal_set
```

```
Set{Any} with 5 elements:
"cow"
"chicken"
"cat"
"pig"
"dog"
```

Note how `dog`

does not appear twice.

```
model = Model()
@variable(model, x[animal_set])
```

```
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, ["cow", "chicken", "cat", "pig", "dog"]
And data, a 5-element Vector{VariableRef}:
x[cow]
x[chicken]
x[cat]
x[pig]
x[dog]
```

## Sets of numbers

Sets of numbers are useful to decribe sets that are ordered, such as years or elements in a vector. The easiest way to create sets of numbers is to use Julia's `range`

syntax.

These can start at `1`

:

```
model = Model()
@variable(model, x[1:4])
```

```
4-element Vector{VariableRef}:
x[1]
x[2]
x[3]
x[4]
```

but they don't have to:

```
model = Model()
@variable(model, x[2012:2021])
```

```
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, 2012:2021
And data, a 10-element Vector{VariableRef}:
x[2012]
x[2013]
x[2014]
x[2015]
x[2016]
x[2017]
x[2018]
x[2019]
x[2020]
x[2021]
```

Ranges also have a `start:step:stop`

syntax. So the Olympic years are:

```
model = Model()
@variable(model, x[1896:4:2020])
```

```
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, 1896:4:2020
And data, a 32-element Vector{VariableRef}:
x[1896]
x[1900]
x[1904]
x[1908]
x[1912]
x[1916]
x[1920]
x[1924]
x[1928]
x[1932]
⋮
x[1988]
x[1992]
x[1996]
x[2000]
x[2004]
x[2008]
x[2012]
x[2016]
x[2020]
```

## Sets of other things

An important observation is that you can have *any* Julia type as the element of a set. It doesn't have to be a `String`

or a `Number`

. For example, you can have tuples:

```
sources = ["A", "B", "C"]
sinks = ["D", "E"]
S = [(source, sink) for source in sources, sink in sinks]
model = Model()
@variable(model, x[S])
```

```
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, [("A", "D"), ("B", "D"), ("C", "D"), ("A", "E"), ("B", "E"), ("C", "E")]
And data, a 6-element Vector{VariableRef}:
x[("A", "D")]
x[("B", "D")]
x[("C", "D")]
x[("A", "E")]
x[("B", "E")]
x[("C", "E")]
```

`x[("A", "D")]`

\[ x_{("A", "D")} \]

For multi-dimensional sets, you can use JuMP's syntax for constructing Containers:

```
model = Model()
@variable(model, x[sources, sinks])
```

```
2-dimensional DenseAxisArray{VariableRef,2,...} with index sets:
Dimension 1, ["A", "B", "C"]
Dimension 2, ["D", "E"]
And data, a 3×2 Matrix{VariableRef}:
x[A,D] x[A,E]
x[B,D] x[B,E]
x[C,D] x[C,E]
```

`x["A", "D"]`

\[ x_{A,D} \]

Note how we indexed `x["A", "D"]`

instead of `x[("A", "D")]`

as above.

## Set operations

Julia has built-in support for set operations such as `union`

, `intersect`

, and `setdiff`

.

Therefore, to create a set of all years in which the summer Olympics were held, we can use:

```
baseline = 1896:4:2020
cancelled = [1916, 1940, 1944, 2020]
off_year = [2021]
olympic_years = union(setdiff(baseline, cancelled), off_year)
```

```
29-element Vector{Int64}:
1896
1900
1904
1908
1912
1920
1924
1928
1932
1936
⋮
1988
1992
1996
2000
2004
2008
2012
2016
2021
```

You can also find the number of elements (i.e., the cardinality) in a set using `length`

:

`length(olympic_years)`

`29`

## Set membership operations

To compute membership of sets, use the `in`

function.

`2000 in olympic_years`

`true`

`2001 in olympic_years`

`false`

## Indexing expressions

Use Julia's generator syntax to compute new sets, such as the list of Olympic years that are divisible by 3:

```
olympic_3_years = [year for year in olympic_years if mod(year, 3) == 0]
model = Model()
@variable(model, x[olympic_3_years])
```

```
1-dimensional DenseAxisArray{VariableRef,1,...} with index sets:
Dimension 1, [1896, 1908, 1920, 1932, 1956, 1968, 1980, 1992, 2004, 2016]
And data, a 10-element Vector{VariableRef}:
x[1896]
x[1908]
x[1920]
x[1932]
x[1956]
x[1968]
x[1980]
x[1992]
x[2004]
x[2016]
```

Alternatively, use JuMP's syntax for constructing Containers:

```
model = Model()
@variable(model, x[year in olympic_years; mod(year, 3) == 0])
```

```
JuMP.Containers.SparseAxisArray{VariableRef, 1, Tuple{Int64}} with 10 entries:
[1896] = x[1896]
[1908] = x[1908]
[1920] = x[1920]
[1932] = x[1932]
[1956] = x[1956]
[1968] = x[1968]
[1980] = x[1980]
[1992] = x[1992]
⋮
[2004] = x[2004]
[2016] = x[2016]
```

## Compound sets

Consider a transportation problem in which we need to ship goods between cities. We have been provided a list of cities:

`cities = ["Auckland", "Wellington", "Christchurch", "Dunedin"]`

```
4-element Vector{String}:
"Auckland"
"Wellington"
"Christchurch"
"Dunedin"
```

and a distance matrix which records the shipping distance between pairs of cities. If we can't ship between two cities, the distance is `0`

.

`distances = [0 643 1071 1426; 0 0 436 790; 0 0 0 360; 1426 0 0 0]`

```
4×4 Matrix{Int64}:
0 643 1071 1426
0 0 436 790
0 0 0 360
1426 0 0 0
```

Let's have a look at ways we could write a model with an objective function to minimize the total shipping cost. For simplicity, we'll ignore all constraints.

### Fix unused variables

One approach is to fix all variables that we can't use to zero. Most solvers are smart-enough to remove these during a presolve phase, so it has a very small impact on performance:

```
N = length(cities)
model = Model()
@variable(model, x[1:N, 1:N] >= 0)
for i in 1:N, j in 1:N
if distances[i, j] == 0
fix(x[i, j], 0.0; force = true)
end
end
@objective(model, Min, sum(distances[i, j] * x[i, j] for i in 1:N, j in 1:N))
```

\[ 643 x_{1,2} + 1071 x_{1,3} + 1426 x_{1,4} + 436 x_{2,3} + 790 x_{2,4} + 360 x_{3,4} + 1426 x_{4,1} \]

### Filtered summation

Another approach is to define filters whenever we want to sum over our decision variables:

```
N = length(cities)
model = Model()
@variable(model, x[1:N, 1:N] >= 0)
@objective(
model,
Min,
sum(
distances[i, j] * x[i, j] for i in 1:N, j in 1:N if distances[i, j] > 0
),
)
```

\[ 643 x_{1,2} + 1071 x_{1,3} + 1426 x_{1,4} + 436 x_{2,3} + 790 x_{2,4} + 360 x_{3,4} + 1426 x_{4,1} \]

### Filtered indexing

We could also use JuMP's support for Containers:

```
N = length(cities)
model = Model()
@variable(model, x[i = 1:N, j = 1:N; distances[i, j] > 0])
@objective(model, Min, sum(distances[i...] * x[i] for i in eachindex(x)))
```

\[ 790 x_{2,4} + 643 x_{1,2} + 1071 x_{1,3} + 1426 x_{4,1} + 360 x_{3,4} + 1426 x_{1,4} + 436 x_{2,3} \]

The `i...`

is called a "splat". It converts a tuple like `(1, 2)`

into two indices like `distances[1, 2]`

.

### Converting to a different data structure

Another approach, and one that is often the most readable, is to convert the data you have into something that is easier to work with. Originally, we had a vector of strings and a matrix of distances. What we really need is something that maps usable origin-destination pairs to distances. A dictionary is an obvious choice:

```
routes = Dict(
(a, b) => distances[i, j] for
(i, a) in enumerate(cities), (j, b) in enumerate(cities) if
distances[i, j] > 0
)
```

```
Dict{Tuple{String, String}, Int64} with 7 entries:
("Auckland", "Wellington") => 643
("Wellington", "Christchurch") => 436
("Wellington", "Dunedin") => 790
("Christchurch", "Dunedin") => 360
("Auckland", "Dunedin") => 1426
("Dunedin", "Auckland") => 1426
("Auckland", "Christchurch") => 1071
```

Then, we can create our model like so:

```
model = Model()
@variable(model, x[keys(routes)])
@objective(model, Min, sum(v * x[k] for (k, v) in routes))
```

\[ 643 x_{("Auckland", "Wellington")} + 436 x_{("Wellington", "Christchurch")} + 790 x_{("Wellington", "Dunedin")} + 360 x_{("Christchurch", "Dunedin")} + 1426 x_{("Auckland", "Dunedin")} + 1426 x_{("Dunedin", "Auckland")} + 1071 x_{("Auckland", "Christchurch")} \]

This has a number of benefits over the other approaches, including a compacter algebraic model and variables that are named in a more meaningful way.

If you're struggling to formulate a problem using the available syntax in JuMP, it's probably a sign that you should convert your data into a different form.

## Next steps

The purpose of this tutorial was to show how JuMP does not have specialized syntax for set creation and manipulation. Instead, you should use the tools provided by Julia itself.

This is both an opportunity and a challenge, because you are free to pick the syntax and data structures that best suit your problem, but for new users it can be daunting to decide which structure to use.

Read through some of the other JuMP tutorials to get inspiration and ideas for how you can use Julia's syntax and data structures to your advantage.

This tutorial was generated using Literate.jl. View the source `.jl`

file on GitHub.