Geometric meaning of LP duality

Правка en3, от adamant, 2025-03-02 20:27:18

Hi everyone!

Recently, linear programming (LP) duality was brought up in a Discord server, and it made me think again about how, whenever it is introduced, it's most often very poorly motivated. Some examples that I've seen include:

  • Directly deriving it from general Lagrange duality (which I did myself in the past, and I like this method, but it's not best for LP);
  • Giving some vague economics-based explanation in terms of goods quantities and costs;
  • Making a very handwavy argument about approximating the target function with constraints;
  • Simply proving it symbolically without giving a good explanation to what and why we're doing.

After some thinking, I would like to share a geometric interpretation to LP duality, that I think should very intuitively and directly explain what's the meaning of our actions, and why does it actually deliver the same result (and not just an upper bound) as the primal problem.

Tl;dr. Both primal and dual LP problems find the supporting hyperplane of the feasible set with the given normal vector. However, the primal problem parameterizes the feasible set and finds the touching point of the hyperplane within the set, while the dual problem parameterizes all hyperplanes with the given normal vector s.t. the feasible set lies below them, and finds the hyperplane with the minimum parameter among them.

Duality formulations

When I was writing my blog about LP duality, I used the following (symmetric) formulation for primal and dual problems:

Primal problemDual problem
$$$\begin{gather} \max\limits_{x \in \mathbb R^n} & c^\top x\\ s.t. & Ax \leq b\\ & x \geq 0 \end{gather}$$$
$$$\begin{gather} \min\limits_{\lambda \in \mathbb R^m} & \lambda^\top b\\ s.t. & \lambda^\top A \geq c^\top\\ & \lambda^\top \geq 0 \end{gather}$$$

Here, we have:

  • $$$n$$$ primal variables, stored in the vector $$$x \in \mathbb R^n$$$;
  • $$$m$$$ inequalities, defined by the matrix $$$A \in \mathbb R^{m \times n}$$$ and the vector $$$b \in \mathbb R^{m}$$$;
  • $$$m$$$ dual variables, stored in the vector $$$\lambda \in \mathbb R^m$$$;
  • A vector $$$c \in \mathbb R^n$$$, defining the primal target function $$$c^\top x$$$.

Reminder 1: "$$$^\top$$$" stands for transposing the vector, i.e. reinterpreting it as a matrix row instead of matrix column.

Reminder 2: $$$a^\top b$$$ for two vectors $$$a, b \in \mathbb R^n$$$ is equal to $$$a \cdot b$$$, their scalar product.

Reminder 3: $$$a \leq b$$$ in vector expressions means that $$$a_i \leq b_i$$$ for all $$$i$$$.

This formulation seemed to be what most of the literature uses, and it looked good due to symmetry. But at the same time, something about it was bugging me... The thing about this formulation is that on one hand, it is very clear that for optimal $$$x$$$ and $$$\lambda^\top$$$, we have the so-called weak duality (dual problem makes an upper bound on the primal problem):

$$$ c^\top x \leq \lambda^\top A x \leq \lambda^\top b $$$

But it is not clear at all why this also implies strong duality $$$c^\top x = \lambda^\top b$$$ for linear programs. And to me it was largely unclear not only from the point of view of formal proofs, but even more so from intuition perspective, it doesn't seem obvious at all.

For example, one of the standard interpretations of duality goes along the lines of "well, we try to bound the answer as much as possible with a linear combination of given inequalities, and it works". And while this interpretation is somewhat useful and largely correct, it still leaves out the question of why exactly should we expect the dual problem to deliver the optimal solution to the primal one?

And when I was thinking about it, it became abundantly clear to me that there is actually a nice and clear geometric interpretation to what is going on, except that $$$\lambda^\top A \geq c^\top$$$ constraint doesn't really translate in anything geometrically nice, and it seemingly would make much more sense to look for $$$\lambda^\top A = c^\top$$$. Well, and the funniest thing is... We actually can do just that!

We just need to get rid of the $$$x \geq 0$$$ constraint that is so prevalent in standard/canonical definitions of LP primal problem:

Primal problemDual problem
$$$\begin{gather} \max\limits_{x \in \mathbb R^n} & c^\top x\\ s.t. & Ax \leq b \end{gather}$$$
$$$\begin{gather} \min\limits_{\lambda \in \mathbb R^m} & \lambda^\top b\\ s.t. & \lambda^\top A = c^\top\\ & \lambda^\top \geq 0 \end{gather}$$$

Note that it doesn't reduce the generality of the formulation:

Indeed, we can treat $$$x \geq 0$$$ as additional constraints of the form $$$-x \leq 0$$$. It will effectively add more dual variables $$$\mu^\top$$$, making the feasibility region of the dual problem $$$\lambda^\top A - \mu^\top = c^\top$$$, s. t. $$$\lambda^\top, \mu^\top \geq 0$$$. But then, since $$$\mu^\top$$$ is arbitrary, we can simply get rid of $$$\mu^\top$$$, and turn the constraint into the inequality $$$\lambda^\top A \geq c^\top$$$, getting an equivalent of the canonical formulation.

Okay, it probably isn't clear yet why this formulation makes it so much better. Let's delve into it.

Geometry of the primal problem


Feasible set (blue) and level set of the target function (red) at optimum $$$x_0$$$

Intuitively, every linear program defines a feasible set $$$S = \{x \in \mathbb R^n : Ax \leq b\}$$$. In other words, a convex polytope defining the allowed set from which we can select points $$$x$$$. Then, the goal is to find the maximum of $$$c^\top x$$$ over all $$$x \in S$$$.

Geometrically, a natural way to visualize this would be to pick an $$$x_0 \in S$$$, and consider the level set of the target function at $$$x_0$$$:

$$$L(x_0) = \{x \in \mathbb R^n : c^\top x = c^\top x_0\}$$$

Note that the condition is equivalent to $$$c^\top (x-x_0) = 0$$$, which essentially defines a hyperplane that passes through $$$x_0$$$, and has the normal vector $$$c$$$. Intuitively, for the optimum $$$x \in S$$$, the set $$$L(x)$$$ is the supporting (tangent) hyperplane of $$$S$$$ with the normal vector $$$c$$$.

Indeed, from $$$x \in S$$$, it follows that $$$L(x)$$$ contains at least one point of $$$S$$$ ($$$x$$$ itself), and if $$$S$$$ is not entirely below $$$L(x)$$$, it means that we can pick another point in $$$S$$$ above $$$L(x)$$$ with the larger scalar product.

Geometry of the dual problem


Feasible set (blue) and its dominating hyperplanes

In the primal problem, we pick $$$x \in S$$$, draw a hyperplane passing through it, and adjust $$$x$$$ to maximize its parameter.

There is another way to do it. For a vector $$$c \in \mathbb R^n$$$ and a scalar $$$d \in \mathbb R$$$, we define a hyperplane $$$H(c, d) = \{x \in \mathbb R^n : c^\top x = d\}$$$. Then, we say that $$$H$$$ dominates the set $$$S$$$ (we denote it as $$$S \leq H$$$) if all points in $$$S$$$ are below $$$H$$$, i.e. $$$c^\top x \leq d$$$.

Now, instead of picking a point $$$x \in S$$$ and maximizing the parameter, we instead can consider the set of all hyperplanes with the normal $$$c$$$ that dominate $$$S$$$, and pick the one with the smallest parameter $$$d$$$. And this is actually exactly what LP dual is doing!

Essentially, in terms of the hyperplanes and the feasible set $$$S$$$, we can formulate the primal and the dual as follows:

$$$ \begin{gather} \begin{matrix} \max\limits_{d \in \mathbb R} & d\\ s.t. & S \cap H(c, d) \neq \varnothing \end{matrix} &\iff& \begin{matrix} \min\limits_{d \in \mathbb R} & d\\ s.t. & S \leq H(c, d) \end{matrix} \end{gather} $$$

And their actual formulations from above are just the parameterizations of the search space. The only question that remains is... Why?

Implied inequalities

You can stop reading if the explanation above is sufficient for you, and you don't care about finer details

Consider the set of inequalities $$$a_i \leq b_i$$$ for $$$i=1,2,\dots$$$, where $$$a_i, b_i \in \mathbb R$$$. What other inequalities can we deduce about $$$a_i$$$ and $$$b_i$$$?

As it follows from the axioms of real numbers, we can get new inequalities by adding old ones together, i.e. getting $$$a_i + a_j \leq b_i + b_j$$$, or by multiplying any inequality with a non-negative number, i.e. getting $$$k a_i \leq k b_i$$$ for $$$k \geq 0$$$. Finally, we can also add $$$0 \leq 1$$$, which is trivially true, to the set of inequalities that we work with. These are actually enough for us to get all valid linear inequalities.

Let's recall how the feasible set $$$S$$$ is defined. The expression $$$Ax \leq b$$$ is simply a short-hand for the collection of $$$m$$$ linear inequalities that all look like $$$a_i^\top x \leq b_i$$$, where $$$a_i \in \mathbb R^n$$$ are the rows of $$$A$$$, and $$$b_i \in \mathbb R$$$ are the coefficients of $$$b$$$. Applying the logic above, we can deduce

$$$ (\lambda_1 a_1 + \dots + \lambda_m a_m)^\top x \leq \lambda_1 b_1 + \dots + \lambda_m b_m $$$

For any $$$\lambda_1,\dots,\lambda_m$$$. This can be written shortly as $$$\lambda^\top Ax \leq \lambda^\top b$$$, where $$$\lambda \geq 0$$$.

What does it mean for $$$S$$$ in terms of hyperplanes? We can see this inequality as defining the hyperplane with the normal vector $$$(\lambda^\top A)^\top = A^\top \lambda$$$ and the parameter $$$\lambda^\top b$$$. Thus, we get the statement $$$S \leq H(A^\top \lambda, \lambda^\top b)$$$.

So, if we look at the dual formulation again:

$$$\begin{gather} \min\limits_{\lambda \in \mathbb R^m} & \lambda^\top b\\ s.t. & \lambda^\top A = c^\top\\ & \lambda^\top \geq 0 \end{gather}$$$

we can now see that, indeed, we just look on some hyperplanes that dominate $$$S$$$. So, the only question is, why is the supporting hyperplane there? And the answer is, because all hyperplanes that dominate $$$S$$$ are there (if we don't fix the normal vector).

Claim. let $$$S = \{x \in \mathbb R^n : Ax \leq b\}$$$. Then, the set

$$$ P=\{H(A^\top \lambda, \lambda^\top b + \mu) : \lambda, \mu \geq 0\}, $$$

where $$$\lambda \in \mathbb R^m$$$ and $$$\mu \in \mathbb R$$$, is exactly the set of all hyperplanes that dominate $$$S$$$.

Proof

This, essentially, mirrors the general proof of LP duality, but also approaches it from geometric perspective.

With all this in mind, this confirms that linear programming dual problem is simply parameterizing the set of dominating hyperplanes of the feasible set, and then finding the supporting hyperplane with the given normal vector among the set.

Теги linear programming, duality

История

 
 
 
 
Правки
 
 
  Rev. Язык Кто Когда Δ Комментарий
en4 Английский adamant 2025-03-02 20:32:47 109
en3 Английский adamant 2025-03-02 20:27:18 55
en2 Английский adamant 2025-03-02 20:26:05 447
en1 Английский adamant 2025-03-02 18:32:25 11934 Initial revision (published)