Optimization & Operations Research

Modeling with ILPs

Example: Knapsack Problem

We want to fill a backpack with the most valuable items possible
However: We can only carry a certain amount of weight.

We have $n$ items.
Each item $i$ has a value $c_i$.
Each item $i$ has a weight $w_i$.
We can only carry a specific amount of weight $W$.

Complete ILP Formulation:

\[ \begin{align*} \max \quad & \sum_{i=1}^n c_i x_i \\ \text{s.t.} \quad & \sum_{i=1}^n w_i x_i \leq W \\ & x_i \in \{0, 1\} \quad & \forall i \in \{1, \dots, n\} \end{align*} \]

Example

W = 10

Item	Value	Weight
A	20	8
B	15	7
C	7	3


          import mip

          model = mip.Model()

          x_A = model.add_var(var_type=mip.BINARY)
          x_B = model.add_var(var_type=mip.BINARY)
          x_C = model.add_var(var_type=mip.BINARY)

          model.objective = mip.maximize(20 * x_A + 15 * x_B + 7 * x_C)
          model.add_constr(8 * x_A + 7 * x_B + 3 * x_C <= 10)

          model.optimize()

          print(x_A.x)
          print(x_B.x)
          print(x_C.x)

Unlike with shortest paths, integrality is essential.

Fractional solution:

Sort items by $\frac{\text{Value}}{\text{Weight}}$.
Fill the backpack in descending order. The last item is taken partially.

Item	Value	Weight	$\frac{\text{Value}}{\text{Weight}}$
A	20	8	2.5
C	7	3	2.333
B	15	7	2.143

Choose A at 100%: Remaining capacity: 2
Choose C at 66%: Remaining capacity: 0

Total value: 24.666

A common pattern in modeling: Binary variables that represent an assignment.

The Knapsack problem is only "weakly" NP-hard.

If all weights are integers (with not too large numbers), we can efficiently solve this using Dynamic Programming.

Dynamic Programming Solution:

We create a table: What is the best solution for all weights up to W, with all items up to the i-th?

W	1	2	3	4	5	6	7	8	9	10
A	0	0	0	0	0	0	0	20	20	20
A, B	0	0	0	0	0	0	15	20	20	20
A, B, C	0	0	7	7	7	7	15	20	20	22

We either take:

The entry from the last row with the same weight
The entry from the last row minus the current weight + current value

\[ \text{opt}(i, w) = \max(\text{opt}(i-1, w), \text{opt}(i-1, w - w_i) + c_i) \]

Dynamic Programming = Recursion + Cache


          values = [20, 15, 7]
          weights = [8, 7, 3]

          import numpy as np

          def solve_knapsack(values, weights, limit):
              assert len(values) == len(weights)
              solution = np.zeros(shape=(len(values)+1, limit+1))

              for item_idx, (value, weight) in enumerate(zip(values, weights)):
                  for w in range(limit+1):
                      if w >= weight:
                          solution[item_idx+1, w] = max(
                              solution[item_idx, w],
                              solution[item_idx, w - weight] + value
                          )
                      else:
                          solution[item_idx+1, w] = solution[item_idx, w]
              return solution[len(values), limit]

          solve_knapsack(values, weights, 10)

This solution does not work well when we have very large weights and capacity.

Usually, this is not a problem.

https://xkcd.com/287/


          import mip

          prices = {
              "Mixed Fruit": 2.15,
              "French Fries": 2.75,
              "Side Salad": 3.35,
              "Hot Wings": 3.55,
              "Mozzerella Sticks": 4.20,
              "Sampler Plate": 5.80
          }

          m = mip.Model("restaurant order")

          variables = {
              item: m.add_var(var_type=mip.INTEGER) for item in prices
          }

          # m.objective = mip.maximize(sum(variables.values()))

          m.add_constr( sum(variables[item] * prices[item] for item in prices) == 15.05 )

          m.optimize()

          for item, var in variables.items():
              if var.x > 0.5:
                  print(f"{item}: {var.x}")

Extension: Bin Packing

We want to pack items of different sizes ($s_i$). A bin has size $B$.
How many bins do we need at least to pack all items, and how do we pack them?

How do we model this?

Idea: Variable $x_{i,j}$ whether item $i$ is packed in bin $j$.

But we don't know how many bins we will need in the end.

Idea: We take more bins than we need. For example, one per item.

We also take another variable $y_j$, which indicates whether we use the bin.

Objective function: Minimize the number of used bins

\[ \min \quad \sum_{j=1}^n y_j \]

Constraints: Capacity of used bins must not be exceeded.

\[ \sum_{i=1}^n s_i x_{i,j} \leq B y_j \quad \forall j \]

Here we have a common pattern: The binary variable $y_j$ activates the capacity of the constraint.

Constraints: Each item must be packed.

\[ \sum_{j=1}^n x_{i,j} = 1 \quad \forall i \]

\[ \begin{align*} \min \quad & \sum_{j=1}^n y_j \\ \text{s.t.} \quad & \sum_{i=1}^n s_i x_{i,j} \leq B y_j \quad & \forall j \\ & \sum_{j=1}^n x_{i,j} = 1 \quad & \forall i \\ & x_{i,j} \in \{0, 1\} \quad & \forall i, \forall j \\ & y_i \in \{0, 1\} \quad & \forall i \end{align*} \]

Next Task: We want to create a timetable

We have a set of courses $K$
We have a set of rooms $R$
- Each course $k \in K$ has a set of permissible rooms $R(k)$
We have a set of time slots $T$
- Each course $k \in K$ has a set of permissible time slots $T(k)$
We have pairwise conflicts $C$. $(k_1, k_2) \in C$ implies that $k_1$ and $k_2$ cannot take place simultaneously.

Goal: Minimize the number of conflicts.

Variables

A variable $x_{k, t, r} \in \{0, 1\}$ for each course and allowable room and timeslot.
A variable $y_{k_1, k_2} \in \{0, 1\}$ for each conflict.

Common pattern: a binary variable with multiple indices that encodes which resource is assigned to another resource.

Modeling with many binary variables is often better than with few integer variables.

Objective function:

\[ \min \quad \sum_{(k_1, k_2) \in C} y_{k_1, k_2} \]

We could also elaborate here that some conflicts are more costly than others.

Constraints: Each course needs a time slot and a room.

\[ \sum_{t \in T(k)} \sum_{r \in R(k)} x_{k, t, r} = 1 \quad \forall k \in K \]

Constraints: No room may be double-booked at any given time.

\[ \sum_{k \in K} x_{k, t, r} \leq 1 \quad \forall t \in T, \forall r \in R \]

Constraints: We are only allowed to create a conflict if we set the corresponding variable.

\[ \sum_{r \in R(k_1)} x_{k_1, t, r} + \sum_{r \in R(k_2)} x_{k_2, t, r} \leq 1 + y_{k_1, k_2} \quad \forall (k_1, k_2) \in C, \forall t \in T \]

If we leave out $y$, we get a "hard" conflict.

Pattern: We have a constraint again that we can switch on or off with a variable.

As a result, the condition becomes optional and we can incorporate its violation into the objective function.

Note: Here we can exceed the side constraint by a maximum of 1.

Next example: Machine Scheduling

We have one machine and $n$ tasks.
Each task $i$ has a duration $p_i$.
Each task $i$ has a deadline $d_i$.

We want to minimize the number of tasks that miss their deadline.

Variables:

What we are looking for is primarily the order in which we work on the tasks. For each of the $n$ tasks, there are $n$ possible positions where it can be carried out.

A variable $x_{i, j} \in \{0, 1\}$: Task $i$ is executed at position $j$.
A variable $y_j \in \{0, 1\}$: The job executed at position $j$ has missed its deadline.
A variable $t_j \geq 0$: The time at which the task at the $j$-th position starts.

We have here a real MILP, as we mix integer with fractional variables.

Objective function:

\[ \min \quad \sum_{j=1}^n y_j \]

Constraints: Each task is executed exactly once.

\[ \sum_{j=1}^n x_{i, j} = 1 \quad \forall i \in \{1, \dots, n\} \]

Constraints: Exactly one task is executed at each position.

\[ \sum_{i=1}^n x_{i, j} = 1 \quad \forall j \in \{1, \dots, n\} \]

Constraints: Each task can only start after the previous one.

\[ t_{j} + \sum_{i=1}^n p_i x_{i,j} \leq t_{j+1} \quad \forall j \in \{1, \dots, n-1\} \]

With $\sum_{i=1}^n p_i x_{i,j}$ we only consider the processing time that is relevant at position $j$.

Constraints: A task may only miss its deadline if the corresponding $y$ variable is set:

\[ t_j + \sum_{i=1}^n p_i x_{i,j} \leq \sum_{i=1}^n d_i x_{i,j} + M y_j \quad \forall j \in \{1, \dots, n\} \]

Here we have two patterns: Once we determine the appropriate constant parameter in two places: \[ \sum_{i=1}^n p_i x_{i,j} \\ \sum_{i=1}^n d_i x_{i,j} \]

The same variable appears twice here. That makes the constraint easier to understand.

This can also be formulated as follows:

\[ t_j + \sum_{i=1}^n (p_i - d_i) x_{i,j} \leq M y_j \quad \forall j \in \{1, \dots, n\} \]

Or even like this (since we know that the $x$'s add up to 1):

\[ \sum_{i=1}^n (t_j + p_i - d_i) x_{i,j} \leq M y_j \quad \forall j \in \{1, \dots, n\} \]

The second pattern is the term $M y_j$. Here we disable the constraint when $y_j = 1$.

$M$ is a "large" number, so that the constraint will be easy to fulfill.

With these "Big-M" constraints we can model an "if-then" statement well.

However, they also make the model harder to solve.

Example: $M=10000$. A job is planned in a way that it is $10$ time units late.

With $y=0.001$, the constraint is turned off in the LP relaxation, but does not cost much in the objective function.

If we need such constraints, we should always try to choose $M$ as small as possible.

Here: What is the time a job can be late in the worst case scenario?

\[ M = \sum_{i=1}^n p_i - \min_i d_i \]

The job with the earliest deadline is processed very late.

Even better: For each position $j$, we consider how late the job at that position can be at the latest:

$P_j$: Let the processing time of the $j$ longest jobs be

\[ M_j = P_{j-1} - \min_i (\underbrace{d_i - p_i}_\text{latest start time}) \]

We can also model an "either-or" with this pattern:

\[ \ldots \leq M y \\ \ldots \leq M (1-y) \]

One of the conditions is on and the other one is off.

Example: I want the variable $x$ to be either less than 10 or greater than 100:

\[ x \leq 10 + M y \\ x \geq 100 - M (1-y) \]

If we can discretize time into a grid, we potentially get a more easily solvable model here.
(without Big-M constraints)

In practice, there are various variations of this problem:

Minimize the amount of minutes by which tasks are late.
Tasks can only be processed starting from a specific release time.
We have multiple machines.
Tasks can be interrupted and "parked".
...

A major advantage of MILP modeling is that we can often integrate new requirements into existing models.

This is a major advantage in agile development processes.

Modeling Tip:

Discretize the problem in such a way that we can use binary variables.

Modeling tip:

Use constraints where all variables are only added:

\[ x_1 + x_2 + 2 x_3 \leq 7 \]

Modeling tip:

Use constraints where all constant terms are 1:

\[ x_1 + x_2 - x_3 \geq 1 \]

Modeling Tip:

Combine both tips from above:

\[ x_1 + x_2 + x_3 \leq 1 \]

What do we do if we want to represent a non-linear function?

Example:

We produce two goods, A and B.
We can produce a maximum of 300 units in total.
Product B brings 9 euros per unit.
Product A brings 10 euros for the first 100 units, 8 euros for the next 100 units, and then 6 euros per unit.

With constant revenues per unit of A, it would be simple:

\[ \begin{align*} \max \quad & 10 x_A + 9 x_B \\ \text{s.t.} \quad & x_A + x_B \leq 300 \\ & x_A, x_B \geq 0 \end{align*} \]


          import mip

          model = mip.Model()

          prod_A = model.add_var(lb=0)
          prod_B = model.add_var(lb=0)

          model.add_constr(prod_A + prod_B <= 300)
          model.objective = mip.maximize(10 * prod_A + 9 * prod_B)

          model.optimize()

Actually, we want to solve the following:

\[ \begin{align*} \max \quad & f(x_A) + 9 x_B \\ \text{s.t.} \quad & x_A + x_B \leq 300 \\ & x_A, x_B \geq 0 \end{align*} \]

with

\[ f(x) = \begin{cases} 10 x & \text{if } x \leq 100 \\ 1000 + 8 (x - 100) & \text{if } 100 < x \leq 200 \\ 1800 + 6 (x - 200) & \text{if } 200 < x \end{cases} \]

The revenues $f$ for A are a
piecewise linear function

A piecewise linear function has:

$n$ linear segments
Break points $b_1, \ldots, b_{n+1}$
At break point $i$, the value $f(b_i)$

$x$ must be located on a segment between two break points:

\[ b_k \leq x \leq b_{k+1} \]

If we know the segment, we can represent $x$ as a linear combination of the break points:

\[ x = z_k b_k + z_{k+1} b_{k+1} \\ z_k + z_{k+1} = 1 \]

Example:

$x = 140$

Break points $b_2 = 100$ and $b_3 = 200$.

$x = 0.6 b_2 + 0.4 b_3 = 60 + 80$

Within the segment, $f$ is linear:

\[ f(x) = f(z_k b_k + z_{k+1} b_{k+1}) \]

\[ = z_k f(b_k) + z_{k+1} f(b_{k+1}) \]

Example:

$x = 140$

$f(140) = f(60 + 80) = f(0.6 b_2 + 0.4 b_3)$

$= 0.6 f(b_2) + 0.4 f(b_3)$

$= 0.6 \cdot 1000 + 0.4 \cdot 1800 = 1320$

We can now model $f$ using the break points:

\[ \begin{align*} \max \quad & {\color{red} 0 z_1 + 1000 z_2 + 1800 z_3 + 2400 z_4} + 9 x_B \\ \text{s.t.} \quad & x_A + x_B \leq 300 \\ & {\color{red} x_A = 0 z_1 + 100 z_2 + 200 z_3 + 300 z_4} \\ & {\color{red} z_1 + z_2 + z_3 + z_4 = 1} \\ & x_A, x_B \geq 0 \\ & {\color{red} z_1, z_2, z_3, z_4 \geq 0 } \end{align*} \]

Currently, the model can still mix the segments arbitrarily, for example:

$z_1 = 0.2 \quad z_2 = 0.3 \quad z_3 = 0 \quad z_4 = 0.5$

We need to ensure that only one segment is selected.

Insert a binary variable $y_1, \ldots, y_n$ per segment.

Choose only one segment:
$y_1 + y_2 + y_3 = 1$
Use $z_1$ only if segment $1$ is active:
$z_1 \leq y_1$
Use $z_2$ only if segment $1$ or $2$ is active:
$z_2 \leq y_1 + y_2$
Use $z_3$ only if segment $2$ or $3$ is active:
$z_3 \leq y_2 + y_3$
Use $z_4$ only if segment $3$ is active:
$z_4 \leq y_3$

\[ \begin{align*} \max \quad & 0 z_1 + 1000 z_2 + 1800 z_3 + 2400 z_4 + 9 x_B \\ \text{s.t.} \quad & x_A + x_B \leq 300 \\ & x_A = 0 z_1 + 100 z_2 + 200 z_3 + 300 z_4 \\ & z_1 + z_2 + z_3 + z_4 = 1 \\ & {\color{red} y_1 + y_2 + y_3 = 1 }\\ & {\color{red} z_1 \leq y_1 }\\ & {\color{red} z_2 \leq y_1 + y_2 }\\ & {\color{red} z_3 \leq y_2 + y_3 }\\ & {\color{red} z_4 \leq y_3 }\\ & x_A, x_B \geq 0 \\ & z_1, z_2, z_3, z_4 \geq 0 \\ & {\color{red} y_1, y_2, y_3 \in \{0, 1\}} \end{align*} \]


          import mip

          model = mip.Model(sense=mip.MAXIMIZE)

          prod_A = model.add_var(lb=0)
          prod_B = model.add_var(lb=0)

          model.add_constr(prod_A + prod_B <= 300)

          z_A1 = model.add_var(lb=0, obj= 0)
          z_A2 = model.add_var(lb=0, obj= 1000)
          z_A3 = model.add_var(lb=0, obj= 1800)
          z_A4 = model.add_var(lb=0, obj= 2400)
          y_A1 = model.add_var(var_type=mip.BINARY)
          y_A2 = model.add_var(var_type=mip.BINARY)
          y_A3 = model.add_var(var_type=mip.BINARY)

          model.add_constr(z_A1 + z_A2 + z_A3 + z_A4 == 1)
          model.add_constr(y_A1 + y_A2 + y_A3 == 1)
          model.add_constr(z_A1 <= y_A1)
          model.add_constr(z_A2 <= y_A1 + y_A2)
          model.add_constr(z_A3 <= y_A2 + y_A3)
          model.add_constr(z_A4 <= y_A3)
          model.add_constr(prod_A == z_A1 * 0 + z_A2 * 100 + z_A3 * 200 + z_A4 * 300)

          model.objective += 9 * prod_B

          model.optimize()

          print(prod_A.x)
          print(prod_B.x)

The representation of the piecewise linear function is a bit cumbersome, but very modular.

No other part of the model needs access to the auxiliary variables $z$ and $y$.

What do we do when a function is not piecewise linear?

We can approximate the function piecewise linearly:

Depending on the required accuracy, we obtain more segments:

How do we deal with uncertainties?

Example: An energy company must decide every year how much gas it wants to buy at the current price without knowing the future price and demand.

This year, one unit of gas costs €5,-
The demand this year is 100 units
If we want to store gas, we have to pay a flat fee of €100,-
For each stored unit, we have to pay €0.5,-

However, we still do not know next year how high the demand and the price will be.

How do we determine the purchase quantity this year?

Idea: Package uncertainty in scenarios.

Scenario	Probability	Demand	Price €
Mild	40%	100	5
Cold	40%	150	7
Very cold	20%	200	9

Predicting good scenarios is not easy.

Need good models / analyses for prediction (e.g. with Machine Learning)

How granular do we want to subdivide the scenarios?

Example:

Weather model predicts Mild, Cold, Very cold with probabilities 0.4, 0.4, and 0.2 respectively.

Demand model predicts Average based on previous weather.

Example:

Historical Demands

Divide historical data into 3 equal amounts and calculate their mean:


          demand_1 = np.mean(data[data < np.percentile(x, 33)])
          demand_2 = np.mean(data[data < np.percentile(x, 66)])
          demand_3 = np.mean(data[data >= np.percentile(x, 66)])

Given the scenarios, how do we use them?

One part of the model represents the decision in the first year.
In the second year, we get a submodel for each scenario.

Variables:

Purchased units in year 1: $x_\text{buynow} \geq 0$
Stored units in year 1: $x_\text{store} \geq 0$
Storage used: $x_\text{usestore} \in \{0, 1\}$
Purchased units in year 2 per scenario:
$x_\text{buylater}^1, x_\text{buylater}^2, x_\text{buylater}^3 \geq 0$

Everything we buy now is stored or consumed:

\[ x_\text{buynow} = x_\text{store} + \text{demand now} \]

When we store something, we must use the storage:

\[ x_\text{store} \leq x_\text{usestore} \cdot \text{max demand} \]

(Big-M constraint)

In each scenario, we can decide how much is still needed:

\[ x_\text{store} + x_\text{buylater}^1 \geq \text{scenario 1 demand} \]
\[ x_\text{store} + x_\text{buylater}^2 \geq \text{scenario 2 demand} \]
\[ x_\text{store} + x_\text{buylater}^3 \geq \text{scenario 3 demand} \]

In each scenario, we only pay the expected value:

\[ \min 5 x_\text{buynow} + 100 x_\text{usestore} + 0.5 x_\text{store} + \\ 0.4 \cdot 5 x_\text{buylater}^1 + 0.4 \cdot 7 x_\text{buylater}^2 + 0.2 \cdot 9 x_\text{buylater}^3 \]


          import mip

          storage_base_cost = 100
          storage_unit_cost = 0.5
          cost_now = 5
          demand_now = 100

          scenarios = [{"cost": 5, "demand": 100, "prob": 0.4},
                       {"cost": 7, "demand": 150, "prob": 0.4},
                       {"cost": 9, "demand": 200, "prob": 0.2}]

          model = mip.Model()

          buy_now = model.add_var(lb = 0)
          store_now = model.add_var(lb = 0)
          use_storage = model.add_var(var_type=mip.BINARY)

          model.add_constr(buy_now == demand_now + store_now)
          model.add_constr(store_now <= use_storage * max(s["demand"] for s in scenarios))

          model.objective = mip.minimize(buy_now * cost_now + storage_base_cost * use_storage + storage_unit_cost * store_now)

          buy_later_vars = []
          for scenario in scenarios:
              buy_later = model.add_var(lb = 0)
              buy_later_vars.append(buy_later)
              model.add_constr(store_now + buy_later >= scenario["demand"])
              model.objective += scenario["prob"] * scenario["cost"] * buy_later

          model.optimize()

Stochastic models also work for pure LPs
(integer values are not necessary to represent the scenarios)