Simplex Algorithm

We are looking for an optimal solution for a general Linear Program:

\[ \begin{aligned} \max \quad & c^T x \\ \text{subject to} \quad & Ax \leq b \\ & x \geq 0 \end{aligned} \]
  • The constraints always form a polyhedron
  • The best solution is always one of the vertices

Can we just try out all the vertices?

How many vertices does this polyhedron have?

\[ \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} x \leq \begin{bmatrix} 1 \\ 1 \end{bmatrix}\\ x \geq 0 \]

4 vertices

And this one?

\[ \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{bmatrix} x \leq \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}\\ x \geq 0 \]

8 vertices

The number of vertices of our cube doubles with each additional dimension.

For a larger number of variables, we cannot try out all vertices in any case.

Iterative approach: We start at a vertex and look for the next one with which we can improve.

Our example problem:

\[ \begin{aligned} \max \quad & 0.6 x_1 + 0.5 x_2 \\ \text{subject to} \quad & 0.8 x_1 + 0.5 x_2 \leq 1500 \\ & 0.2 x_1 + 0.5 x_2 \leq 900 \\ & x_1 \geq 0 \\ & x_2 \geq 0 \end{aligned} \]

Where do we start?

Trivial starting solution: $x = \begin{bmatrix} 0 \\ 0 \end{bmatrix}$

In which direction do we move?
Do we increase $x_1$ or $x_2$?

Objective function was: $\max 0.6 x_1 + 0.5 x_2$.

A step towards $x_1$ brings more initially than a step towards $x_2$.

How large can we make $x_1$ at most?

  • $0.8 x_1 + 0.5 x_2 \leq 1500$
  • $0.2 x_1 + 0.5 x_2 \leq 900$

$x_1 = \min (\frac{1500}{0.8}, \frac{900}{ 0.2})$
$ = \min (1875, 4500) = 1875$

New solution: $x = \begin{bmatrix} 1875 \\ 0 \end{bmatrix}$

Objective function value: $0.6 \cdot 1875 = 1125$

Can we improve the solution by increasing $x_2$ now?

Not so simple, we must not walk out of the polyhedron.

If we increase $x_2$, we must decrease $x_1$. But by how much?

$0.8 x_1 + 0.5 x_2 = 1500$

If we increase $x_2$ by $1$, $x_1$ decreases by $\frac{0.5}{0.8} = 0.625$.

Shouldn't we also consider the constraint $0.2 x_1 + 0.5 x_2 \leq 900$?

$0.2 \cdot 1875 = 375 < 900$
The constraint is currently not "active".

Is this trade-off worth it?

Objective function was: $\max 0.6 x_1 + 0.5 x_2$.

Increasing $x_2$ by $1$ changes the objective function value by
$0.6 \cdot (- 0.625) + 0.5 \cdot 1 = 0.125$

Increasing $x_2$ makes sense.

How far can we now increase $x_2$?

As long as until the next constraint applies.

  • $0.2 x_1 + 0.5 x_2 \leq 900$
  • $x_1 \geq 0$

When we increase $x_2$ until $x_1 = 0$ applies, then we have:

\[ 0.8 \cdot 0 + 0.5 \cdot x_2 = 1500 \]

$x_2 =\frac{1500}{0.5} = 3000$.

The solution violates
$0.2 \cdot 0 + 0.5 \cdot 3000 = 1500 \not\leq 900$

What is the solution when we increase $x_2$ until $0.2 x_1 + 0.5 x_2 \leq 900$ applies?

Then it holds:
$ 0.8 \cdot x_1 + 0.5 \cdot x_2 = 1500\\ 0.2 \cdot x_1 + 0.5 \cdot x_2 = 900 $

or:
$ \begin{bmatrix} 0.8 & 0.5 \\ 0.2 & 0.5 \end{bmatrix} x = \begin{bmatrix} 1500 \\ 900 \end{bmatrix} $

To solve such a system of equations, there are several possibilities. For example:

  • Gaussian elimination
  • Calculating the inverse matrix
  • Matrix decomposition
  • Iterative approximations

If we invert matrix $A$, we get the solution:

$A x = b \quad \Leftrightarrow\quad x = A^{-1} b$

We can calculate the inverse matrix for example using the Python Numpy package:


          import numpy as np

          A = np.array([[0.8, 0.5], [0.2, 0.5]])
          A_inv = np.linalg.inv(A)

          print(A_inv)
        

This leads us to the solution: \[ x = \begin{bmatrix} \frac{5}{3} & -\frac{5}{3} \\ -\frac{2}{3} & \frac{8}{3} \end{bmatrix} \begin{bmatrix} 1500 \\ 900 \end{bmatrix} = \begin{bmatrix} 1000 \\ 1400 \end{bmatrix} \]

This also satisfies the other constraint:
$x_1 \geq 0$.

Our new solution: $x = \begin{bmatrix} 1000 \\ 1400 \end{bmatrix}$

Can we further improve?

We could further increase $x_2$.

When we further increase $x_2$, we need to decrease $x_1$ further: \[ 0.2 \cdot x_1 + 0.5 \cdot x_2 = 900 \]

Each further step towards $x_2$ reduces $x_1$ by
$\frac{0.5}{0.2}=2.5$

So, if we take another step towards $x_2$, the objective function changes by: \[ 0.6 \cdot (-2.5) + 0.5 \cdot 1 = -1.0 \]

So, we cannot further improve in that direction.

We now want to make the process more systematic to deal with any problems.

We start with any LP in canonical form: \[ \begin{aligned} \max \quad & c^T x \\ \text{subject to} \quad & Ax \leq b \\ & x \geq 0 \end{aligned} \]

If we have a linear equation system $Sx = b$, with $S \in \mathbb{R}^{n \times n}$, then we can solve it with the inverse: \[ x = S^{-1} b \]

Here, it's important to have equalities and that the matrix is square.

We can transform an inequality into an equation with an additional variable:

\[ a_{i1} x_1 + \dots + a_{in} x_n \leq b_i \Leftrightarrow\\ \quad a_{i1} x_1 + \dots + a_{in} x_n + z_i = b_i, \quad z_i \geq 0 \]

"Slack variable"

Each slack variable indicates how far we are from the boundary of the inequality.

So, we can formulate our constraints as equations:

\[ \begin{aligned} \max \quad & c^T x \\ \text{subject to} \quad & Ax + Iz = b \\ & x \geq 0 \\ & z \geq 0 \end{aligned} \]

Our constraints are now:

\[ \underbrace{\begin{bmatrix} A & I \end{bmatrix}}_{S} \begin{bmatrix} x \\ z \end{bmatrix} = b \]

$S$ is not square, but has more columns than rows.

If we select a subset $B$ of columns, as large as the number of rows, then the submatrix $S_B$ is square.

Example

\[ S = \begin{bmatrix} 0.8 & 0.5 & 1 & 0 \\ 0.2 & 0.5 & 0 & 1 \end{bmatrix} \quad B = [0, 3] \] \[ S_B = \begin{bmatrix} 0.8 & 0 \\ 0.2 & 1 \end{bmatrix} \]
With $S_B^{-1}$ we can compute a solution: \[ \bar b = S_B^{-1} b = \begin{bmatrix} 1.25 & 0 \\ -0.25 & 1 \end{bmatrix} \begin{bmatrix} 1500 \\ 900 \end{bmatrix} = \begin{bmatrix} 1875 \\ 525 \end{bmatrix} \]

$\begin{bmatrix}x \\ z \end{bmatrix} = \begin{bmatrix} 1875 \\ 0 \\ 0 \\ 525 \end{bmatrix}$

So, in this solution, we produce 1875g of dough A and have 525g of chocolate left over.

We call the set of columns $B$ basis.

The variables of the basis are those that receive a value greater than 0.

All columns not part of the basis are denoted by $N$.

All variables corresponding to a column from $N$ are 0 in the solution.

Each basis corresponds to a vertex of the polyhedron of constraints.

The geometric problem now corresponds to the question of which variables we solve the equation system for.

We call the current solution $\bar b = S_B^{-1} b$ basic solution.

The step to the next vertex consists of the following operations:

  • Decide which new column joins the basis.
  • Decide which column leaves the basis.

This decision is called a "pivot operation".

Select new basic column:

For each non-basic variable, we determine how much the objective function increases with a step in the corresponding direction:

\[ \bar c = c_N - S_N^T {S_B^{-1}}^T c_B \]


              import numpy as np

              Sb = np.array([[0.8, 0], [0.2, 1]])
              Sn = np.array([[0.5, 1], [0.5, 0]])
              cn = np.array([0.5, 0])
              cb = np.array([0.6, 0])

              cn - Sn.T.dot(np.linalg.inv(Sb).T.dot(cb))
            

As a result, we get \[ \begin{bmatrix} 0.125 \\ -0.75 \end{bmatrix} \]

Increasing $x_2$ by 1 increases the objective function value by $0.125$.
Increasing $z_1$, on the other hand, worsens the objective function.

These values $\bar c$ are called reduced costs.

If all reduced costs are negative, we have found an optimal solution because no further improvement is possible.

We choose the variable $i \in N$ with the highest reduced costs. (so $x_2$)

This is not necessarily the best choice. There are more sophisticated strategies for choosing the new basic column.

Now we need to choose a variable $j \in B$ to leave the basis.

Requirement: No constraint must be violated.

So, no variable can become negative.

Let $\bar S = S_B^{-1} S_N$.

If we remove column $j$ from the basis, then the new variable takes the value: \[ \frac{\bar b_j}{\bar S_{j i}} \]

The current basic solution $\bar b$ is positive. So, for the new value not to become negative, $\bar S_{j i}$ must be greater than $0$.

We are allowed to move with the new variable only minimally far to avoid overshooting the next corner.

Choose $j$ with the smallest non-negative new variable value: \[ j = \operatornamewithlimits{arg\, min}_{j \in B, \bar S_{j i} > 0} \frac{\bar b_j}{\bar S_{j i}} \]


        import numpy as np

        Sb = np.array([[0.8, 0], [0.2, 1]])
        Sn = np.array([[0.5, 1], [0.5, 0]])
        b = np.array([1500, 900])

        Sb_inv = np.linalg.inv(Sb)
        basic_solution = Sb_inv.dot(b)
        S_bar = Sb_inv.dot(Sn)

        new_values = basic_solution / S_bar[:, 0] # Beware of division by zero
        j = np.argmin(new_values[new_values > 0])
      

In the example, $x_2$ can take either $3000$ or $1400$ as the new value.

Current basis was $B = [0, 3]$. So, $z_2$ must go out.

New basis is $B = [0, 1]$.

$\bar b = S_B^{-1} b = \begin{bmatrix} 1000 \\ 1400 \end{bmatrix}$

To check if it continues, we again calculate the reduced costs:

\[ \bar c = c_N - S_N^T {S_B^{-1}}^T c_B = \begin{bmatrix} -\frac{2}{3} \\ -\frac{1}{3} \end{bmatrix} \]

So, we have found an optimal solution.

However, the reduced costs tell us more:

  • If we increase the distance to the first inequality by $1$, the objective function decreases by $\frac{2}{3}$.
  • Conversely: If we could run beyond the inequality by $1$, the objective function value would increase by $\frac{2}{3}$.

        import numpy as np

        Sb = np.array([[0.8, 0.5], [0.2, 0.5]])
        cb = np.array([0.6, 0.5])
        b_bar = np.linalg.inv(Sb).dot(np.array([1501, 900]))
        cb.dot(b_bar)

        # => 1300.6666666666665
      

An extra gram of flour thus brings $0.667$ cents more, an extra gram of chocolate only $0.333$ cents.

The added value increases linearly until the solution becomes invalid and we need to find a new vertex.

We have already covered the most important steps for a working Simplex algorithm.

Is this method better than trying all vertices?

Bad news: In the worst case, the algorithm must explore all vertices. (Klee-Minty cube)

However, this rarely happens in practice. Hence, the algorithm almost always works well.

In general, we still need to address a few special cases to solve arbitrary LPs.

We have assumed that we have a valid solution at the beginning.

In our case, that was $x=0$, meaning the basis consisted only of slack variables.

However, it might be that $x=0$ is not valid:

\[ - x_1 \leq - 400 \]

(We need to produce at least 400g of dough A)

How do we obtain an initially valid solution?

We can introduce an auxiliary variable $z' \geq 0$ with $-x_1 - z' = -400$.

Then we first search for a solution that only minimizes the values of $z'$.

Searching for the valid starting solution is called Phase 1 of the Simplex Algorithm and is potentially as time-consuming as Phase 2.

It might happen that we cannot find a valid starting solution (i.e., $z' > 0$ remains).

Then the LP is infeasible.

Example

\[ \begin{aligned} \max \quad & x_1\\ \text{subject to} \quad & x_1 \leq 10 \\ & - x_1 \leq -12 \\ & x \geq 0 \\ \end{aligned} \]

There is another special case:

\[ \begin{aligned} \max \quad & x_1\\ \text{subject to} \quad & - x_1 \leq -12 \\ & x \geq 0 \\ \end{aligned} \]

The solution is unbounded.

We can notice this case in the Simplex when we cannot find a column to leave the basis because all $\bar S_{j i} < 0$.

In practice, "degenerate" vertices can cause problems:

Here, infinite loops can occur. Good pivot rules and numerical tricks can help avoid this in practice.

The most expensive step in the Simplex method is the computation of $S_B^{-1}$.

The trick is not to recalculate the inverse at every step, but only to make updates.

For example, through LU decomposition and Forrest-Tomlin updates

Lower Triangular Matrix (L)

A (square) matrix that has entries only on or below the diagonal:

\[ L = \begin{bmatrix} l_{11} & 0 & 0 & \dots & 0 \\ l_{21} & l_{22} & 0 & \dots & 0 \\ l_{31} & l_{32} & l_{33} & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ l_{n1} & l_{n2} & l_{n3} & \dots & l_{nn} \end{bmatrix} \]

Suppose we want to solve $Lx = b$.

\[ \begin{align*} &l_{11} x_1 &= b_1 \\ &l_{21} x_1 + l_{22} x_2 &= b_2 \\ &l_{31} x_1 + l_{32} x_2 + l_{33} x_3 &= b_3 \\ &\vdots \\ &l_{n1} x_1 + l_{n2} x_2 + l_{n3} x_3 + \dots + l_{nn} x_n &= b_n \end{align*} \]

Solving for $x_1$ is simple: \[ x_1 = \frac{b_1}{l_{11}} \]

Once we know $x_1$, solving for $x_2$ is also simple:

\[ x_2 = \frac{b_2 - l_{21} x_1}{l_{22}} \]

We can quickly solve the entire system of equations analogously.

An LU decomposition of a matrix is the division of the matrix into a Lower Triangular Matrix (L) and an Upper Triangular Matrix (U)

\[ \begin{bmatrix} a_{11} & a_{12} & a_{13} & \dots & a_{1n} \\ a_{21} & a_{22} & a_{23} & \dots & a_{2n} \\ a_{31} & a_{32} & a_{33} & \dots & a_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & a_{n3} & \dots & a_{nn} \end{bmatrix} = \begin{bmatrix} l_{11} & 0 & 0 & \dots & 0 \\ l_{21} & l_{22} & 0 & \dots & 0 \\ l_{31} & l_{32} & l_{33} & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ l_{n1} & l_{n2} & l_{n3} & \dots & l_{nn} \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} & u_{13} & \dots & u_{1n} \\ 0 & u_{22} & u_{23} & \dots & u_{2n} \\ 0 & 0 & u_{33} & \dots & u_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & u_{nn} \end{bmatrix} \]

We can now quickly solve the system of equations $Ax = b$:

  • Solving $Ly = b$ is easy.
  • Similarly, solving $Ux = y$ is easy.
  • This gives us the solution for $LUx = b$
    and thus for $Ax = b$.

Once we have an LU decomposition and only one row and one column changes in the original matrix, the decomposition can be updated with little effort.