Modeling with ILPs

Example: Knapsack Problem

  • We want to fill a backpack with the most valuable items possible
  • However: We can only carry a certain amount of weight.
  • We have $n$ items.
  • Each item $i$ has a value $c_i$.
  • Each item $i$ has a weight $w_i$.
  • We can only carry a specific amount of weight $W$.

Complete ILP Formulation:

\[ \begin{align*} \max \quad & \sum_{i=1}^n c_i x_i \\ \text{s.t.} \quad & \sum_{i=1}^n w_i x_i \leq W \\ & x_i \in \{0, 1\} \quad & \forall i \in \{1, \dots, n\} \end{align*} \]

Example

W = 10

ItemValueWeight
A208
B157
C73

          import mip

          model = mip.Model()

          x_A = model.add_var(var_type=mip.BINARY)
          x_B = model.add_var(var_type=mip.BINARY)
          x_C = model.add_var(var_type=mip.BINARY)

          model.objective = mip.maximize(20 * x_A + 15 * x_B + 7 * x_C)
          model.add_constr(8 * x_A + 7 * x_B + 3 * x_C <= 10)

          model.optimize()

          print(x_A.x)
          print(x_B.x)
          print(x_C.x)
        

Unlike with shortest paths, integrality is essential.

Fractional solution:

  • Sort items by $\frac{\text{Value}}{\text{Weight}}$.
  • Fill the backpack in descending order. The last item is taken partially.
ItemValueWeight$\frac{\text{Value}}{\text{Weight}}$
A2082.5
C732.333
B1572.143
  • Choose A at 100%: Remaining capacity: 2
  • Choose C at 66%: Remaining capacity: 0

Total value: 24.666

A common pattern in modeling: Binary variables that represent an assignment.

The Knapsack problem is only "weakly" NP-hard.

If all weights are integers (with not too large numbers), we can efficiently solve this using Dynamic Programming.

Dynamic Programming Solution:

We create a table: What is the best solution for all weights up to W, with all items up to the i-th?

W012345678910
A00000000202020
A, B000000015202020
A, B, C000777715202022

    We either take:

  • The entry from the last row with the same weight
  • The entry from the last row minus the current weight + current value

\[ \text{opt}(i, w) = \max(\text{opt}(i-1, w), \text{opt}(i-1, w - w_i) + c_i) \]

Dynamic Programming = Recursion + Cache


          values = [20, 15, 7]
          weights = [8, 7, 3]

          import numpy as np

          def solve_knapsack(values, weights, limit):
              assert len(values) == len(weights)
              solution = np.zeros(shape=(len(values)+1, limit+1))

              for item_idx, (value, weight) in enumerate(zip(values, weights)):
                  for w in range(limit+1):
                      if w >= weight:
                          solution[item_idx+1, w] = max(
                              solution[item_idx, w],
                              solution[item_idx, w - weight] + value
                          )
                      else:
                          solution[item_idx+1, w] = solution[item_idx, w]
              return solution[len(values), limit]

          solve_knapsack(values, weights, 10)
        

This solution does not work well when we have very large weights and capacity.

Usually, this is not a problem.


https://xkcd.com/287/


          import mip

          prices = {
              "Mixed Fruit": 2.15,
              "French Fries": 2.75,
              "Side Salad": 3.35,
              "Hot Wings": 3.55,
              "Mozzerella Sticks": 4.20,
              "Sampler Plate": 5.80
          }

          m = mip.Model("restaurant order")

          variables = {
              item: m.add_var(var_type=mip.INTEGER) for item in prices
          }

          # m.objective = mip.maximize(sum(variables.values()))

          m.add_constr( sum(variables[item] * prices[item] for item in prices) == 15.05 )

          m.optimize()

          for item, var in variables.items():
              if var.x > 0.5:
                  print(f"{item}: {var.x}")
        

Extension: Bin Packing

We want to pack items of different sizes ($s_i$). A bin has size $B$.
How many bins do we need at least to pack all items, and how do we pack them?

How do we model this?

Idea: Variable $x_{i,j}$ whether item $i$ is packed in bin $j$.

But we don't know how many bins we will need in the end.

Idea: We take more bins than we need. For example, one per item.

We also take another variable $y_j$, which indicates whether we use the bin.

Objective function: Minimize the number of used bins

\[ \min \quad \sum_{j=1}^n y_j \]

Constraints: Capacity of used bins must not be exceeded.

\[ \sum_{i=1}^n s_i x_{i,j} \leq B y_j \quad \forall j \]

Here we have a common pattern: The binary variable $y_j$ activates the capacity of the constraint.

Constraints: Each item must be packed.

\[ \sum_{j=1}^n x_{i,j} = 1 \quad \forall i \]

\[ \begin{align*} \min \quad & \sum_{j=1}^n y_j \\ \text{s.t.} \quad & \sum_{i=1}^n s_i x_{i,j} \leq B y_j \quad & \forall j \\ & \sum_{j=1}^n x_{i,j} = 1 \quad & \forall i \\ & x_{i,j} \in \{0, 1\} \quad & \forall i, \forall j \\ & y_i \in \{0, 1\} \quad & \forall i \end{align*} \]

Next Task: We want to create a timetable

  • We have a set of courses $K$
  • We have a set of rooms $R$
    • Each course $k \in K$ has a set of permissible rooms $R(k)$
  • We have a set of time slots $T$
    • Each course $k \in K$ has a set of permissible time slots $T(k)$
  • We have pairwise conflicts $C$. $(k_1, k_2) \in C$ implies that $k_1$ and $k_2$ cannot take place simultaneously.

Goal: Minimize the number of conflicts.

    Variables

  • A variable $x_{k, t, r} \in \{0, 1\}$ for each course and allowable room and timeslot.
  • A variable $y_{k_1, k_2} \in \{0, 1\}$ for each conflict.

Common pattern: a binary variable with multiple indices that encodes which resource is assigned to another resource.

Modeling with many binary variables is often better than with few integer variables.

Objective function:

\[ \min \quad \sum_{(k_1, k_2) \in C} y_{k_1, k_2} \]

We could also elaborate here that some conflicts are more costly than others.

Constraints: Each course needs a time slot and a room.

\[ \sum_{t \in T(k)} \sum_{r \in R(k)} x_{k, t, r} = 1 \quad \forall k \in K \]

Constraints: No room may be double-booked at any given time.

\[ \sum_{k \in K} x_{k, t, r} \leq 1 \quad \forall t \in T, \forall r \in R \]

Constraints: We are only allowed to create a conflict if we set the corresponding variable.

\[ \sum_{r \in R(k_1)} x_{k_1, t, r} + \sum_{r \in R(k_2)} x_{k_2, t, r} \leq 1 + y_{k_1, k_2} \quad \forall (k_1, k_2) \in C, \forall t \in T \]

If we leave out $y$, we get a "hard" conflict.

Pattern: We have a constraint again that we can switch on or off with a variable.

As a result, the condition becomes optional and we can incorporate its violation into the objective function.

Note: Here we can exceed the side constraint by a maximum of 1.

Next example: Machine Scheduling

  • We have one machine and $n$ tasks.
  • Each task $i$ has a duration $p_i$.
  • Each task $i$ has a deadline $d_i$.

We want to minimize the number of tasks that miss their deadline.

Variables:

What we are looking for is primarily the order in which we work on the tasks. For each of the $n$ tasks, there are $n$ possible positions where it can be carried out.

  • A variable $x_{i, j} \in \{0, 1\}$: Task $i$ is executed at position $j$.
  • A variable $y_j \in \{0, 1\}$: The job executed at position $j$ has missed its deadline.
  • A variable $t_j \geq 0$: The time at which the task at the $j$-th position starts.

We have here a real MILP, as we mix integer with fractional variables.

Objective function:

\[ \min \quad \sum_{j=1}^n y_j \]

Constraints: Each task is executed exactly once.

\[ \sum_{j=1}^n x_{i, j} = 1 \quad \forall i \in \{1, \dots, n\} \]

Constraints: Exactly one task is executed at each position.

\[ \sum_{i=1}^n x_{i, j} = 1 \quad \forall j \in \{1, \dots, n\} \]

Constraints: Each task can only start after the previous one.

\[ t_{j} + \sum_{i=1}^n p_i x_{i,j} \leq t_{j+1} \quad \forall j \in \{1, \dots, n-1\} \]

With $\sum_{i=1}^n p_i x_{i,j}$ we only consider the processing time that is relevant at position $j$.

Constraints: A task may only miss its deadline if the corresponding $y$ variable is set:

\[ t_j + \sum_{i=1}^n p_i x_{i,j} \leq \sum_{i=1}^n d_i x_{i,j} + M y_j \quad \forall j \in \{1, \dots, n\} \]

Here we have two patterns: Once we determine the appropriate constant parameter in two places: \[ \sum_{i=1}^n p_i x_{i,j} \\ \sum_{i=1}^n d_i x_{i,j} \]

The same variable appears twice here. That makes the constraint easier to understand.

This can also be formulated as follows:

\[ t_j + \sum_{i=1}^n (p_i - d_i) x_{i,j} \leq M y_j \quad \forall j \in \{1, \dots, n\} \]

Or even like this (since we know that the $x$'s add up to 1):

\[ \sum_{i=1}^n (t_j + p_i - d_i) x_{i,j} \leq M y_j \quad \forall j \in \{1, \dots, n\} \]

The second pattern is the term $M y_j$. Here we disable the constraint when $y_j = 1$.

$M$ is a "large" number, so that the constraint will be easy to fulfill.

With these "Big-M" constraints we can model an "if-then" statement well.

However, they also make the model harder to solve.

Example: $M=10000$. A job is planned in a way that it is $10$ time units late.

With $y=0.001$, the constraint is turned off in the LP relaxation, but does not cost much in the objective function.

If we need such constraints, we should always try to choose $M$ as small as possible.

Here: What is the time a job can be late in the worst case scenario?

\[ M = \sum_{i=1}^n p_i - \min_i d_i \]

The job with the earliest deadline is processed very late.

Even better: For each position $j$, we consider how late the job at that position can be at the latest:

$P_j$: Let the processing time of the $j$ longest jobs be

\[ M_j = P_{j-1} - \min_i (\underbrace{d_i - p_i}_\text{latest start time}) \]

We can also model an "either-or" with this pattern:

\[ \ldots \leq M y \\ \ldots \leq M (1-y) \]

One of the conditions is on and the other one is off.

Example: I want the variable $x$ to be either less than 10 or greater than 100:

\[ x \leq 10 + M y \\ x \geq 100 - M (1-y) \]

If we can discretize time into a grid, we potentially get a more easily solvable model here.
(without Big-M constraints)

In practice, there are various variations of this problem:

  • Minimize the amount of minutes by which tasks are late.
  • Tasks can only be processed starting from a specific release time.
  • We have multiple machines.
  • Tasks can be interrupted and "parked".
  • ...

A major advantage of MILP modeling is that we can often integrate new requirements into existing models.

This is a major advantage in agile development processes.

Modeling Tip:

Discretize the problem in such a way that we can use binary variables.

Modeling tip:

Use constraints where all variables are only added:

\[ x_1 + x_2 + 2 x_3 \leq 7 \]

Modeling tip:

Use constraints where all constant terms are 1:

\[ x_1 + x_2 - x_3 \geq 1 \]

Modeling Tip:

Combine both tips from above:

\[ x_1 + x_2 + x_3 \leq 1 \]

What do we do if we want to represent a non-linear function?

Example:

  • We produce two goods, A and B.
  • We can produce a maximum of 300 units in total.
  • Product B brings 9 euros per unit.
  • Product A brings 10 euros for the first 100 units, 8 euros for the next 100 units, and then 6 euros per unit.

With constant revenues per unit of A, it would be simple:

\[ \begin{align*} \max \quad & 10 x_A + 9 x_B \\ \text{s.t.} \quad & x_A + x_B \leq 300 \\ & x_A, x_B \geq 0 \end{align*} \]

          import mip

          model = mip.Model()

          prod_A = model.add_var(lb=0)
          prod_B = model.add_var(lb=0)

          model.add_constr(prod_A + prod_B <= 300)
          model.objective = mip.maximize(10 * prod_A + 9 * prod_B)

          model.optimize()
        

Actually, we want to solve the following:

\[ \begin{align*} \max \quad & f(x_A) + 9 x_B \\ \text{s.t.} \quad & x_A + x_B \leq 300 \\ & x_A, x_B \geq 0 \end{align*} \]

with

\[ f(x) = \begin{cases} 10 x & \text{if } x \leq 100 \\ 1000 + 8 (x - 100) & \text{if } 100 < x \leq 200 \\ 1800 + 6 (x - 200) & \text{if } 200 < x \end{cases} \]

The revenues $f$ for A are a
piecewise linear function

    A piecewise linear function has:

  • $n$ linear segments
  • Break points $b_1, \ldots, b_{n+1}$
  • At break point $i$, the value $f(b_i)$

$x$ must be located on a segment between two break points:

\[ b_k \leq x \leq b_{k+1} \]

If we know the segment, we can represent $x$ as a linear combination of the break points:

\[ x = z_k b_k + z_{k+1} b_{k+1} \\ z_k + z_{k+1} = 1 \]

Example:

$x = 140$

Break points $b_2 = 100$ and $b_3 = 200$.

$x = 0.6 b_2 + 0.4 b_3 = 60 + 80$

Within the segment, $f$ is linear:

\[ f(x) = f(z_k b_k + z_{k+1} b_{k+1}) \]

\[ = z_k f(b_k) + z_{k+1} f(b_{k+1}) \]

Example:

$x = 140$

$f(140) = f(60 + 80) = f(0.6 b_2 + 0.4 b_3)$

$= 0.6 f(b_2) + 0.4 f(b_3)$

$= 0.6 \cdot 1000 + 0.4 \cdot 1800 = 1320$

We can now model $f$ using the break points:

\[ \begin{align*} \max \quad & {\color{red} 0 z_1 + 1000 z_2 + 1800 z_3 + 2400 z_4} + 9 x_B \\ \text{s.t.} \quad & x_A + x_B \leq 300 \\ & {\color{red} x_A = 0 z_1 + 100 z_2 + 200 z_3 + 300 z_4} \\ & {\color{red} z_1 + z_2 + z_3 + z_4 = 1} \\ & x_A, x_B \geq 0 \\ & {\color{red} z_1, z_2, z_3, z_4 \geq 0 } \end{align*} \]

Currently, the model can still mix the segments arbitrarily, for example:

$z_1 = 0.2 \quad z_2 = 0.3 \quad z_3 = 0 \quad z_4 = 0.5$

We need to ensure that only one segment is selected.

Insert a binary variable $y_1, \ldots, y_n$ per segment.

  • Choose only one segment:
    $y_1 + y_2 + y_3 = 1$
  • Use $z_1$ only if segment $1$ is active:
    $z_1 \leq y_1$
  • Use $z_2$ only if segment $1$ or $2$ is active:
    $z_2 \leq y_1 + y_2$
  • Use $z_3$ only if segment $2$ or $3$ is active:
    $z_3 \leq y_2 + y_3$
  • Use $z_4$ only if segment $3$ is active:
    $z_4 \leq y_3$

\[ \begin{align*} \max \quad & 0 z_1 + 1000 z_2 + 1800 z_3 + 2400 z_4 + 9 x_B \\ \text{s.t.} \quad & x_A + x_B \leq 300 \\ & x_A = 0 z_1 + 100 z_2 + 200 z_3 + 300 z_4 \\ & z_1 + z_2 + z_3 + z_4 = 1 \\ & {\color{red} y_1 + y_2 + y_3 = 1 }\\ & {\color{red} z_1 \leq y_1 }\\ & {\color{red} z_2 \leq y_1 + y_2 }\\ & {\color{red} z_3 \leq y_2 + y_3 }\\ & {\color{red} z_4 \leq y_3 }\\ & x_A, x_B \geq 0 \\ & z_1, z_2, z_3, z_4 \geq 0 \\ & {\color{red} y_1, y_2, y_3 \in \{0, 1\}} \end{align*} \]


          import mip

          model = mip.Model(sense=mip.MAXIMIZE)

          prod_A = model.add_var(lb=0)
          prod_B = model.add_var(lb=0)

          model.add_constr(prod_A + prod_B <= 300)

          z_A1 = model.add_var(lb=0, obj= 0)
          z_A2 = model.add_var(lb=0, obj= 1000)
          z_A3 = model.add_var(lb=0, obj= 1800)
          z_A4 = model.add_var(lb=0, obj= 2400)
          y_A1 = model.add_var(var_type=mip.BINARY)
          y_A2 = model.add_var(var_type=mip.BINARY)
          y_A3 = model.add_var(var_type=mip.BINARY)

          model.add_constr(z_A1 + z_A2 + z_A3 + z_A4 == 1)
          model.add_constr(y_A1 + y_A2 + y_A3 == 1)
          model.add_constr(z_A1 <= y_A1)
          model.add_constr(z_A2 <= y_A1 + y_A2)
          model.add_constr(z_A3 <= y_A2 + y_A3)
          model.add_constr(z_A4 <= y_A3)
          model.add_constr(prod_A == z_A1 * 0 + z_A2 * 100 + z_A3 * 200 + z_A4 * 300)

          model.objective += 9 * prod_B

          model.optimize()

          print(prod_A.x)
          print(prod_B.x)
        

The representation of the piecewise linear function is a bit cumbersome, but very modular.

No other part of the model needs access to the auxiliary variables $z$ and $y$.

What do we do when a function is not piecewise linear?

We can approximate the function piecewise linearly:

Depending on the required accuracy, we obtain more segments:

How do we deal with uncertainties?

Example: An energy company must decide every year how much gas it wants to buy at the current price without knowing the future price and demand.

  • This year, one unit of gas costs €5,-
  • The demand this year is 100 units
  • If we want to store gas, we have to pay a flat fee of €100,-
  • For each stored unit, we have to pay €0.5,-

However, we still do not know next year how high the demand and the price will be.

How do we determine the purchase quantity this year?

Idea: Package uncertainty in scenarios.

ScenarioProbabilityDemandPrice €
Mild40%1005
Cold40%1507
Very cold20%2009

Predicting good scenarios is not easy.

Need good models / analyses for prediction (e.g. with Machine Learning)

How granular do we want to subdivide the scenarios?

Example:

Weather model predicts Mild, Cold, Very cold with probabilities 0.4, 0.4, and 0.2 respectively.

Demand model predicts Average based on previous weather.

Example:

Historical Demands

Divide historical data into 3 equal amounts and calculate their mean:

          demand_1 = np.mean(data[data < np.percentile(x, 33)])
          demand_2 = np.mean(data[data < np.percentile(x, 66)])
          demand_3 = np.mean(data[data >= np.percentile(x, 66)])
        

Given the scenarios, how do we use them?

  • One part of the model represents the decision in the first year.
  • In the second year, we get a submodel for each scenario.

Variables:

  • Purchased units in year 1: $x_\text{buynow} \geq 0$
  • Stored units in year 1: $x_\text{store} \geq 0$
  • Storage used: $x_\text{usestore} \in \{0, 1\}$
  • Purchased units in year 2 per scenario:
    $x_\text{buylater}^1, x_\text{buylater}^2, x_\text{buylater}^3 \geq 0$

Everything we buy now is stored or consumed:

\[ x_\text{buynow} = x_\text{store} + \text{demand now} \]

When we store something, we must use the storage:

\[ x_\text{store} \leq x_\text{usestore} \cdot \text{max demand} \]

(Big-M constraint)

In each scenario, we can decide how much is still needed:

  • \[ x_\text{store} + x_\text{buylater}^1 \geq \text{scenario 1 demand} \]
  • \[ x_\text{store} + x_\text{buylater}^2 \geq \text{scenario 2 demand} \]
  • \[ x_\text{store} + x_\text{buylater}^3 \geq \text{scenario 3 demand} \]

In each scenario, we only pay the expected value:

\[ \min 5 x_\text{buynow} + 100 x_\text{usestore} + 0.5 x_\text{store} + \\ 0.4 \cdot 5 x_\text{buylater}^1 + 0.4 \cdot 7 x_\text{buylater}^2 + 0.2 \cdot 9 x_\text{buylater}^3 \]


          import mip

          storage_base_cost = 100
          storage_unit_cost = 0.5
          cost_now = 5
          demand_now = 100

          scenarios = [{"cost": 5, "demand": 100, "prob": 0.4},
                       {"cost": 7, "demand": 150, "prob": 0.4},
                       {"cost": 9, "demand": 200, "prob": 0.2}]

          model = mip.Model()

          buy_now = model.add_var(lb = 0)
          store_now = model.add_var(lb = 0)
          use_storage = model.add_var(var_type=mip.BINARY)

          model.add_constr(buy_now == demand_now + store_now)
          model.add_constr(store_now <= use_storage * max(s["demand"] for s in scenarios))

          model.objective = mip.minimize(buy_now * cost_now + storage_base_cost * use_storage + storage_unit_cost * store_now)

          buy_later_vars = []
          for scenario in scenarios:
              buy_later = model.add_var(lb = 0)
              buy_later_vars.append(buy_later)
              model.add_constr(store_now + buy_later >= scenario["demand"])
              model.objective += scenario["prob"] * scenario["cost"] * buy_later

          model.optimize()
        

Stochastic models also work for pure LPs
(integer values are not necessary to represent the scenarios)