Simplex Algorithm |
We are looking for an optimal solution for a general Linear Program:
\[ \begin{aligned} \max \quad & c^T x \\ \text{subject to} \quad & Ax \leq b \\ & x \geq 0 \end{aligned} \]Can we just try out all the vertices?
How many vertices does this polyhedron have?
|
\[
\begin{bmatrix}
1 & 0 \\
0 & 1
\end{bmatrix}
x \leq \begin{bmatrix}
1 \\
1
\end{bmatrix}\\
x \geq 0
\]
4 vertices |
|
And this one?
\[ \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{bmatrix} x \leq \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}\\ x \geq 0 \]8 vertices
The number of vertices of our cube doubles with each additional dimension.
For a larger number of variables, we cannot try out all vertices in any case.
Iterative approach: We start at a vertex and look for the next one with which we can improve.
Our example problem:
\[ \begin{aligned} \max \quad & 0.6 x_1 + 0.5 x_2 \\ \text{subject to} \quad & 0.8 x_1 + 0.5 x_2 \leq 1500 \\ & 0.2 x_1 + 0.5 x_2 \leq 900 \\ & x_1 \geq 0 \\ & x_2 \geq 0 \end{aligned} \]Where do we start?
Trivial starting solution: $x = \begin{bmatrix} 0 \\ 0 \end{bmatrix}$
In which direction do we move?
Do we increase $x_1$ or $x_2$?
Objective function was: $\max 0.6 x_1 + 0.5 x_2$.
A step towards $x_1$ brings more initially than a step towards $x_2$.
How large can we make $x_1$ at most?
$x_1 = \min (\frac{1500}{0.8}, \frac{900}{ 0.2})$
$ = \min (1875, 4500) = 1875$
New solution: $x = \begin{bmatrix} 1875 \\ 0 \end{bmatrix}$
Objective function value: $0.6 \cdot 1875 = 1125$
Can we improve the solution by increasing $x_2$ now?
Not so simple, we must not walk out of the polyhedron.
If we increase $x_2$, we must decrease $x_1$. But by how much?
$0.8 x_1 + 0.5 x_2 = 1500$
If we increase $x_2$ by $1$, $x_1$ decreases by $\frac{0.5}{0.8} = 0.625$.
Shouldn't we also consider the constraint $0.2 x_1 + 0.5 x_2 \leq 900$?
$0.2 \cdot 1875 = 375 < 900$
The constraint is currently not "active".
Is this trade-off worth it?
Objective function was: $\max 0.6 x_1 + 0.5 x_2$.
Increasing $x_2$ by $1$ changes the objective function value by
$0.6 \cdot (- 0.625) + 0.5 \cdot 1 = 0.125$
Increasing $x_2$ makes sense.
How far can we now increase $x_2$?
As long as until the next constraint applies.
|
|
When we increase $x_2$ until $x_1 = 0$ applies, then we have:
\[ 0.8 \cdot 0 + 0.5 \cdot x_2 = 1500 \]$x_2 =\frac{1500}{0.5} = 3000$.
The solution violates
$0.2 \cdot 0 + 0.5 \cdot 3000 = 1500 \not\leq 900$
What is the solution when we increase $x_2$ until $0.2 x_1 + 0.5 x_2 \leq 900$ applies?
Then it holds:
$
0.8 \cdot x_1 + 0.5 \cdot x_2 = 1500\\
0.2 \cdot x_1 + 0.5 \cdot x_2 = 900
$
or:
$
\begin{bmatrix} 0.8 & 0.5 \\ 0.2 & 0.5 \end{bmatrix}
x = \begin{bmatrix} 1500 \\ 900 \end{bmatrix}
$
To solve such a system of equations, there are several possibilities. For example:
If we invert matrix $A$, we get the solution:
$A x = b \quad \Leftrightarrow\quad x = A^{-1} b$
We can calculate the inverse matrix for example using the Python Numpy package:
import numpy as np
A = np.array([[0.8, 0.5], [0.2, 0.5]])
A_inv = np.linalg.inv(A)
print(A_inv)
This leads us to the solution: \[ x = \begin{bmatrix} \frac{5}{3} & -\frac{5}{3} \\ -\frac{2}{3} & \frac{8}{3} \end{bmatrix} \begin{bmatrix} 1500 \\ 900 \end{bmatrix} = \begin{bmatrix} 1000 \\ 1400 \end{bmatrix} \]
This also satisfies the other constraint:
$x_1 \geq 0$.
Our new solution: $x = \begin{bmatrix} 1000 \\ 1400 \end{bmatrix}$
Can we further improve?
We could further increase $x_2$.
When we further increase $x_2$, we need to decrease $x_1$ further: \[ 0.2 \cdot x_1 + 0.5 \cdot x_2 = 900 \]
Each further step towards $x_2$ reduces $x_1$ by
$\frac{0.5}{0.2}=2.5$
So, if we take another step towards $x_2$, the objective function changes by: \[ 0.6 \cdot (-2.5) + 0.5 \cdot 1 = -1.0 \]
So, we cannot further improve in that direction.
We now want to make the process more systematic to deal with any problems.
We start with any LP in canonical form: \[ \begin{aligned} \max \quad & c^T x \\ \text{subject to} \quad & Ax \leq b \\ & x \geq 0 \end{aligned} \]
If we have a linear equation system $Sx = b$, with $S \in \mathbb{R}^{n \times n}$, then we can solve it with the inverse: \[ x = S^{-1} b \]
Here, it's important to have equalities and that the matrix is square.
We can transform an inequality into an equation with an additional variable:
\[ a_{i1} x_1 + \dots + a_{in} x_n \leq b_i \Leftrightarrow\\ \quad a_{i1} x_1 + \dots + a_{in} x_n + z_i = b_i, \quad z_i \geq 0 \]
"Slack variable"
Each slack variable indicates how far we are from the boundary of the inequality.
So, we can formulate our constraints as equations:
\[ \begin{aligned} \max \quad & c^T x \\ \text{subject to} \quad & Ax + Iz = b \\ & x \geq 0 \\ & z \geq 0 \end{aligned} \]Our constraints are now:
\[ \underbrace{\begin{bmatrix} A & I \end{bmatrix}}_{S} \begin{bmatrix} x \\ z \end{bmatrix} = b \]$S$ is not square, but has more columns than rows.
If we select a subset $B$ of columns, as large as the number of rows, then the submatrix $S_B$ is square.
Example
\[ S = \begin{bmatrix} 0.8 & 0.5 & 1 & 0 \\ 0.2 & 0.5 & 0 & 1 \end{bmatrix} \quad B = [0, 3] \] \[ S_B = \begin{bmatrix} 0.8 & 0 \\ 0.2 & 1 \end{bmatrix} \]$\begin{bmatrix}x \\ z \end{bmatrix} = \begin{bmatrix} 1875 \\ 0 \\ 0 \\ 525 \end{bmatrix}$
So, in this solution, we produce 1875g of dough A and have 525g of chocolate left over.
We call the set of columns $B$ basis.
The variables of the basis are those that receive a value greater than 0.
All columns not part of the basis are denoted by $N$.
All variables corresponding to a column from $N$ are 0 in the solution.
Each basis corresponds to a vertex of the polyhedron of constraints.
The geometric problem now corresponds to the question of which variables we solve the equation system for.
We call the current solution $\bar b = S_B^{-1} b$ basic solution.
The step to the next vertex consists of the following operations:
This decision is called a "pivot operation".
Select new basic column:
For each non-basic variable, we determine how much the objective function increases with a step in the corresponding direction:
\[ \bar c = c_N - S_N^T {S_B^{-1}}^T c_B \]
import numpy as np
Sb = np.array([[0.8, 0], [0.2, 1]])
Sn = np.array([[0.5, 1], [0.5, 0]])
cn = np.array([0.5, 0])
cb = np.array([0.6, 0])
cn - Sn.T.dot(np.linalg.inv(Sb).T.dot(cb))
As a result, we get \[ \begin{bmatrix} 0.125 \\ -0.75 \end{bmatrix} \]
Increasing $x_2$ by 1 increases the objective function value by $0.125$.
Increasing $z_1$, on the other hand, worsens the objective function.
These values $\bar c$ are called reduced costs.
If all reduced costs are negative, we have found an optimal solution because no further improvement is possible.
We choose the variable $i \in N$ with the highest reduced costs. (so $x_2$)
This is not necessarily the best choice. There are more sophisticated strategies for choosing the new basic column.
Now we need to choose a variable $j \in B$ to leave the basis.
Requirement: No constraint must be violated.
So, no variable can become negative.
Let $\bar S = S_B^{-1} S_N$.
If we remove column $j$ from the basis, then the new variable takes the value: \[ \frac{\bar b_j}{\bar S_{j i}} \]
The current basic solution $\bar b$ is positive. So, for the new value not to become negative, $\bar S_{j i}$ must be greater than $0$.
We are allowed to move with the new variable only minimally far to avoid overshooting the next corner.
Choose $j$ with the smallest non-negative new variable value: \[ j = \operatornamewithlimits{arg\, min}_{j \in B, \bar S_{j i} > 0} \frac{\bar b_j}{\bar S_{j i}} \]
import numpy as np
Sb = np.array([[0.8, 0], [0.2, 1]])
Sn = np.array([[0.5, 1], [0.5, 0]])
b = np.array([1500, 900])
Sb_inv = np.linalg.inv(Sb)
basic_solution = Sb_inv.dot(b)
S_bar = Sb_inv.dot(Sn)
new_values = basic_solution / S_bar[:, 0] # Beware of division by zero
j = np.argmin(new_values[new_values > 0])
In the example, $x_2$ can take either $3000$ or $1400$ as the new value.
Current basis was $B = [0, 3]$. So, $z_2$ must go out.
New basis is $B = [0, 1]$.
$\bar b = S_B^{-1} b = \begin{bmatrix} 1000 \\ 1400 \end{bmatrix}$
To check if it continues, we again calculate the reduced costs:
\[ \bar c = c_N - S_N^T {S_B^{-1}}^T c_B = \begin{bmatrix} -\frac{2}{3} \\ -\frac{1}{3} \end{bmatrix} \]
So, we have found an optimal solution.
However, the reduced costs tell us more:
import numpy as np
Sb = np.array([[0.8, 0.5], [0.2, 0.5]])
cb = np.array([0.6, 0.5])
b_bar = np.linalg.inv(Sb).dot(np.array([1501, 900]))
cb.dot(b_bar)
# => 1300.6666666666665
An extra gram of flour thus brings $0.667$ cents more, an extra gram of chocolate only $0.333$ cents.
The added value increases linearly until the solution becomes invalid and we need to find a new vertex.
We have already covered the most important steps for a working Simplex algorithm.
Is this method better than trying all vertices?
Bad news: In the worst case, the algorithm must explore all vertices. (Klee-Minty cube)
However, this rarely happens in practice. Hence, the algorithm almost always works well.
In general, we still need to address a few special cases to solve arbitrary LPs.
We have assumed that we have a valid solution at the beginning.
In our case, that was $x=0$, meaning the basis consisted only of slack variables.
However, it might be that $x=0$ is not valid:
\[ - x_1 \leq - 400 \](We need to produce at least 400g of dough A)
How do we obtain an initially valid solution?
We can introduce an auxiliary variable $z' \geq 0$ with $-x_1 - z' = -400$.
Then we first search for a solution that only minimizes the values of $z'$.
Searching for the valid starting solution is called Phase 1 of the Simplex Algorithm and is potentially as time-consuming as Phase 2.
It might happen that we cannot find a valid starting solution (i.e., $z' > 0$ remains).
Then the LP is infeasible.
Example
\[ \begin{aligned} \max \quad & x_1\\ \text{subject to} \quad & x_1 \leq 10 \\ & - x_1 \leq -12 \\ & x \geq 0 \\ \end{aligned} \]There is another special case:
\[ \begin{aligned} \max \quad & x_1\\ \text{subject to} \quad & - x_1 \leq -12 \\ & x \geq 0 \\ \end{aligned} \]The solution is unbounded.
We can notice this case in the Simplex when we cannot find a column to leave the basis because all $\bar S_{j i} < 0$.
In practice, "degenerate" vertices can cause problems:
Here, infinite loops can occur. Good pivot rules and numerical tricks can help avoid this in practice.
The most expensive step in the Simplex method is the computation of $S_B^{-1}$.
The trick is not to recalculate the inverse at every step, but only to make updates.
For example, through LU decomposition and Forrest-Tomlin updates
Lower Triangular Matrix (L)
A (square) matrix that has entries only on or below the diagonal:
\[ L = \begin{bmatrix} l_{11} & 0 & 0 & \dots & 0 \\ l_{21} & l_{22} & 0 & \dots & 0 \\ l_{31} & l_{32} & l_{33} & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ l_{n1} & l_{n2} & l_{n3} & \dots & l_{nn} \end{bmatrix} \]
Suppose we want to solve $Lx = b$.
\[ \begin{align*} &l_{11} x_1 &= b_1 \\ &l_{21} x_1 + l_{22} x_2 &= b_2 \\ &l_{31} x_1 + l_{32} x_2 + l_{33} x_3 &= b_3 \\ &\vdots \\ &l_{n1} x_1 + l_{n2} x_2 + l_{n3} x_3 + \dots + l_{nn} x_n &= b_n \end{align*} \]
Solving for $x_1$ is simple: \[ x_1 = \frac{b_1}{l_{11}} \]
Once we know $x_1$, solving for $x_2$ is also simple:
\[ x_2 = \frac{b_2 - l_{21} x_1}{l_{22}} \]We can quickly solve the entire system of equations analogously.
An LU decomposition of a matrix is the division of the matrix into a Lower Triangular Matrix (L) and an Upper Triangular Matrix (U)
\[ \begin{bmatrix} a_{11} & a_{12} & a_{13} & \dots & a_{1n} \\ a_{21} & a_{22} & a_{23} & \dots & a_{2n} \\ a_{31} & a_{32} & a_{33} & \dots & a_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & a_{n3} & \dots & a_{nn} \end{bmatrix} = \begin{bmatrix} l_{11} & 0 & 0 & \dots & 0 \\ l_{21} & l_{22} & 0 & \dots & 0 \\ l_{31} & l_{32} & l_{33} & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ l_{n1} & l_{n2} & l_{n3} & \dots & l_{nn} \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} & u_{13} & \dots & u_{1n} \\ 0 & u_{22} & u_{23} & \dots & u_{2n} \\ 0 & 0 & u_{33} & \dots & u_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & u_{nn} \end{bmatrix} \]
We can now quickly solve the system of equations $Ax = b$:
Once we have an LU decomposition and only one row and one column changes in the original matrix, the decomposition can be updated with little effort.