Блог пользователя adamant

Автор adamant, история, 25 часов назад, По-английски

Hi everyone!

Recently, linear programming (LP) duality was brought up in a Discord server, and it made me think again about how, whenever it is introduced, it's most often very poorly motivated. Some examples that I've seen include:

  • Directly deriving it from general Lagrange duality (which I did myself in the past, and I like this method, but it's not best for LP);
  • Giving some vague economics-based explanation in terms of goods quantities and costs;
  • Making a very handwavy argument about approximating the target function with constraints;
  • Simply proving it symbolically without giving a good explanation to what and why we're doing.

After some thinking, I would like to share a geometric interpretation to LP duality, that I think should very intuitively and directly explain what's the meaning of our actions, and why does it actually deliver the same result (and not just an upper bound) as the primal problem.

Tl;dr. Both primal and dual LP problems find the supporting hyperplane of the feasible set with the given normal vector. However, the primal problem parameterizes the feasible set and finds the touching point of the hyperplane within the set, while the dual problem parameterizes all hyperplanes with the given normal vector s.t. the feasible set lies below them, and finds the hyperplane with the minimum parameter among them.

Duality formulations

When I was writing my blog about LP duality, I used the following (symmetric) formulation for primal and dual problems:

Primal problemDual problem
$$$\begin{gather} \max\limits_{x \in \mathbb R^n} & c^\top x\\ s.t. & Ax \leq b\\ & x \geq 0 \end{gather}$$$
$$$\begin{gather} \min\limits_{\lambda \in \mathbb R^m} & \lambda^\top b\\ s.t. & \lambda^\top A \geq c^\top\\ & \lambda^\top \geq 0 \end{gather}$$$

Here, we have:

  • $$$n$$$ primal variables, stored in the vector $$$x \in \mathbb R^n$$$;
  • $$$m$$$ inequalities, defined by the matrix $$$A \in \mathbb R^{m \times n}$$$ and the vector $$$b \in \mathbb R^{m}$$$;
  • $$$m$$$ dual variables, stored in the vector $$$\lambda \in \mathbb R^m$$$;
  • A vector $$$c \in \mathbb R^n$$$, defining the primal target function $$$c^\top x$$$.

Reminder 1: "$$$^\top$$$" stands for transposing the vector, i.e. reinterpreting it as a matrix row instead of matrix column.

Reminder 2: $$$a^\top b$$$ for two vectors $$$a, b \in \mathbb R^n$$$ is equal to $$$a \cdot b$$$, their scalar product.

Reminder 3: $$$a \leq b$$$ in vector expressions means that $$$a_i \leq b_i$$$ for all $$$i$$$.

This formulation seemed to be what most of the literature uses, and it looked good due to symmetry. But at the same time, something about it was bugging me... The thing about this formulation is that on one hand, it is very clear that for optimal $$$x$$$ and $$$\lambda^\top$$$, we have the so-called weak duality (dual problem makes an upper bound on the primal problem):

$$$ c^\top x \leq \lambda^\top A x \leq \lambda^\top b $$$

But it is not clear at all why this also implies strong duality $$$c^\top x = \lambda^\top b$$$ for linear programs. And to me it was largely unclear not only from the point of view of formal proofs, but even more so from intuition perspective, it doesn't seem obvious at all.

For example, one of the standard interpretations of duality goes along the lines of "well, we try to bound the answer as much as possible with a linear combination of given inequalities, and it works". And while this interpretation is somewhat useful and largely correct, it still leaves out the question of why exactly should we expect the dual problem to deliver the optimal solution to the primal one?

And when I was thinking about it, it became abundantly clear to me that there is actually a nice and clear geometric interpretation to what is going on, except that $$$\lambda^\top A \geq c^\top$$$ constraint doesn't really translate in anything geometrically nice, and it seemingly would make much more sense to look for $$$\lambda^\top A = c^\top$$$. Well, and the funniest thing is... We actually can do just that!

We just need to get rid of the $$$x \geq 0$$$ constraint that is so prevalent in standard/canonical definitions of LP primal problem:

Primal problemDual problem
$$$\begin{gather} \max\limits_{x \in \mathbb R^n} & c^\top x\\ s.t. & Ax \leq b \end{gather}$$$
$$$\begin{gather} \min\limits_{\lambda \in \mathbb R^m} & \lambda^\top b\\ s.t. & \lambda^\top A = c^\top\\ & \lambda^\top \geq 0 \end{gather}$$$

Note that it doesn't reduce the generality of the formulation:

Indeed, we can treat $$$x \geq 0$$$ as additional constraints of the form $$$-x \leq 0$$$. It will effectively add more dual variables $$$\mu^\top$$$, making the feasibility region of the dual problem $$$\lambda^\top A - \mu^\top = c^\top$$$, s. t. $$$\lambda^\top, \mu^\top \geq 0$$$. But then, since $$$\mu^\top$$$ is arbitrary, we can simply get rid of $$$\mu^\top$$$, and turn the constraint into the inequality $$$\lambda^\top A \geq c^\top$$$, getting an equivalent of the canonical formulation.

Okay, it probably isn't clear yet why this formulation makes it so much better. Let's delve into it.

Geometry of the primal problem


Feasible set (blue) and level set of the target function (red) at optimum $$$x_0$$$

Intuitively, every linear program defines a feasible set $$$S = \{x \in \mathbb R^n : Ax \leq b\}$$$. In other words, a convex polytope defining the allowed set from which we can select points $$$x$$$. Then, the goal is to find the maximum of $$$c^\top x$$$ over all $$$x \in S$$$.

Geometrically, a natural way to visualize this would be to pick an $$$x_0 \in S$$$, and consider the level set of the target function at $$$x_0$$$:

$$$L(x_0) = \{x \in \mathbb R^n : c^\top x = c^\top x_0\}$$$

Note that the condition is equivalent to $$$c^\top (x-x_0) = 0$$$, which essentially defines a hyperplane that passes through $$$x_0$$$, and has the normal vector $$$c$$$. Intuitively, for the optimum $$$x \in S$$$, the set $$$L(x)$$$ is the supporting (tangent) hyperplane of $$$S$$$ with the normal vector $$$c$$$.

Indeed, from $$$x \in S$$$, it follows that $$$L(x)$$$ contains at least one point of $$$S$$$ ($$$x$$$ itself), and if $$$S$$$ is not entirely below $$$L(x)$$$, it means that we can pick another point in $$$S$$$ above $$$L(x)$$$ with the larger scalar product.

Geometry of the dual problem


Feasible set (blue) and its dominating hyperplanes

In the primal problem, we pick $$$x \in S$$$, draw a hyperplane passing through it, and adjust $$$x$$$ to maximize its parameter.

There is another way to do it. For a vector $$$c \in \mathbb R^n$$$ and a scalar $$$d \in \mathbb R$$$, we define a hyperplane $$$H(c, d) = \{x \in \mathbb R^n : c^\top x = d\}$$$. Then, we say that $$$H$$$ dominates the set $$$S$$$ (we denote it as $$$S \leq H$$$) if all points in $$$S$$$ are below $$$H$$$, i.e. $$$c^\top x \leq d$$$.

Now, instead of picking a point $$$x \in S$$$ and maximizing the parameter, we instead can consider the set of all hyperplanes with the normal $$$c$$$ that dominate $$$S$$$, and pick the one with the smallest parameter $$$d$$$. And this is actually exactly what LP dual is doing!

Essentially, in terms of the hyperplanes and the feasible set $$$S$$$, we can formulate the primal and the dual as follows:

$$$ \begin{gather} \begin{matrix} \max\limits_{d \in \mathbb R} & d\\ s.t. & S \cap H(c, d) \neq \varnothing \end{matrix} &\iff& \begin{matrix} \min\limits_{d \in \mathbb R} & d\\ s.t. & S \leq H(c, d) \end{matrix} \end{gather} $$$

And their actual formulations from above are just the parameterizations of the search space. The only question that remains is... Why?

Implied inequalities

You can stop reading if the explanation above is sufficient for you, and you don't care about finer details

Consider the set of inequalities $$$a_i \leq b_i$$$ for $$$i=1,2,\dots$$$, where $$$a_i, b_i \in \mathbb R$$$. What other inequalities can we deduce about $$$a_i$$$ and $$$b_i$$$?

As it follows from the axioms of real numbers, we can get new inequalities by adding old ones together, i.e. getting $$$a_i + a_j \leq b_i + b_j$$$, or by multiplying any inequality with a non-negative number, i.e. getting $$$k a_i \leq k b_i$$$ for $$$k \geq 0$$$. Finally, we can also add $$$0 \leq 1$$$, which is trivially true, to the set of inequalities that we work with. These are actually enough for us to get all valid linear inequalities.

Let's recall how the feasible set $$$S$$$ is defined. The expression $$$Ax \leq b$$$ is simply a short-hand for the collection of $$$m$$$ linear inequalities that all look like $$$a_i^\top x \leq b_i$$$, where $$$a_i \in \mathbb R^n$$$ are the rows of $$$A$$$, and $$$b_i \in \mathbb R$$$ are the coefficients of $$$b$$$. Applying the logic above, we can deduce

$$$ (\lambda_1 a_1 + \dots + \lambda_m a_m)^\top x \leq \lambda_1 b_1 + \dots + \lambda_m b_m $$$

For any $$$\lambda_1,\dots,\lambda_m$$$. This can be written shortly as $$$\lambda^\top Ax \leq \lambda^\top b$$$, where $$$\lambda \geq 0$$$.

What does it mean for $$$S$$$ in terms of hyperplanes? We can see this inequality as defining the hyperplane with the normal vector $$$(\lambda^\top A)^\top = A^\top \lambda$$$ and the parameter $$$\lambda^\top b$$$. Thus, we get the statement $$$S \leq H(A^\top \lambda, \lambda^\top b)$$$.

So, if we look at the dual formulation again:

$$$\begin{gather} \min\limits_{\lambda \in \mathbb R^m} & \lambda^\top b\\ s.t. & \lambda^\top A = c^\top\\ & \lambda^\top \geq 0 \end{gather}$$$

we can now see that, indeed, we just look on some hyperplanes that dominate $$$S$$$. So, the only question is, why is the supporting hyperplane there? And the answer is, because all hyperplanes that dominate $$$S$$$ are there (if we don't fix the normal vector).

Claim. let $$$S = \{x \in \mathbb R^n : Ax \leq b\} \neq \varnothing$$$. Then, the set

$$$ P=\{H(A^\top \lambda, \lambda^\top b + \mu) : \lambda, \mu \geq 0\}, $$$

where $$$\lambda \in \mathbb R^m$$$ and $$$\mu \in \mathbb R$$$, is exactly the set of all hyperplanes that dominate $$$S$$$.

Proof

This, essentially, mirrors the general proof of LP duality, but also approaches it from geometric perspective.

With all this in mind, this confirms that linear programming dual problem is simply parameterizing the set of dominating hyperplanes of the feasible set, and then finding the supporting hyperplane with the given normal vector among the set.

  • Проголосовать: нравится
  • +135
  • Проголосовать: не нравится

»
25 часов назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

it's most often very poorly motivated

nah it's just described by the people who use it

  • »
    »
    23 часа назад, # ^ |
    Rev. 2   Проголосовать: нравится +27 Проголосовать: не нравится

    I think this oversimplifies the issue. There are a lot of domains that rely on it beyond economics (statistics, machine learning, data analysis, computer science), but they all tend to do poorly at providing a meaningful intuition for it, especially when it comes to concisely explaining why the solution to the dual is exact solution to the primal, and not just an upper bound.

    • »
      »
      »
      15 часов назад, # ^ |
        Проголосовать: нравится -7 Проголосовать: не нравится

      Point still stands, Dantzig was probably an economist but not a geometer

      • »
        »
        »
        »
        8 часов назад, # ^ |
          Проголосовать: нравится +8 Проголосовать: не нравится

        Yes, and it's fine to provide a good rationale outside of geometry. Well, the issue here is, I spent 3 hours yesterday trying to understand how economist perspective on LP dual works. And while the economist explanation I was given for primal problem makes total sense, the explanation for dual felt very stretched and unnatural (even from economics POV), and left it unclear why the result would be the exact match with the primal, rather than just an upper bound.

        It could as well be that it is an issue with the particular version of the explanation (although I think they're all similar to it), but even then it kind of strengthens the point that most explanations used in courses, etc, fail to convey it clearly enough, even within their context.

»
22 часа назад, # |
  Проголосовать: нравится +19 Проголосовать: не нравится

GOAT is back

  • »
    »
    18 часов назад, # ^ |
      Проголосовать: нравится -49 Проголосовать: не нравится

    GOAT for what? he's just searching for some ugly maths and papers online copying them and pasting them here so you and others think that he's a great mathematician while he's just a fake imposter

    • »
      »
      »
      16 часов назад, # ^ |
      Rev. 4   Проголосовать: нравится +33 Проголосовать: не нравится

      If you think this is very advanced hard math, you are greatly mistaken...(although adamant does have maths blogs).

      Of course, he is not inventing new tricks in all of his blogs, but writing a comprehensive tldr about a non trivial concept (which is no easy task + it is very useful!). Adamant happens to be quite good at doing it. If you're not interested, do not click.

      Anyway, I didn't read the blog yet but it seems not to talk about extremal points, it could be a good idea for a part 2?

      • »
        »
        »
        »
        8 часов назад, # ^ |
          Проголосовать: нравится +5 Проголосовать: не нравится

        It doesn't mention them directly, but it should be kind of clear that supporting hyperplane should touch the feasible set at some extremal points? And maybe less clear, but still useful, that at the extremal point, you can represent the normal vector of any supporting hyperplane as a conic combination of the normal vectors of the faces that touch the point.

        But I'm not sure what exactly could be said about them beyond that...

    • »
      »
      »
      8 часов назад, # ^ |
        Проголосовать: нравится +25 Проголосовать: не нравится

      Whether it is ugly or not is subjective (I wouldn't post about something I personally find ugly, unless it is about what I think is a nice way to make it less ugly), but in any case what I'm doing is by no means just a copy paste.

      Yes, often there are papers that describe the same concept as the blog. But I always spend quite some time to figure out what's the best way to present their results, and this often includes e. g. coming up with my own proofs, perspectives and explanations that you wouldn't easily find in papers online, or if you find, it would be difficult to see that this is what they actually mean, because many papers are not very good at articulating their point, or because it is scattered over a few different papers.

      I of course understand that the way I present material is not the best fit for a lot of people (and you can argue that I'm not good at articulating my blogs in an accessible manner either), but hopefully there are also some like me, who would enjoy my style.

      I'm not trying to present myself as a great mathematician, and I don't feel like I am. There are people like ecnerwala, Elegia, hos.lyric here, and I can only dream to achieve even a fraction of how good they are at math.

      In fact, I very often find it difficult for me to understand advanced results, and writing blogs about them is the way to structure them in a way that makes sense for me. For a lot of blogs, I had very little understanding of the subject when I started writing the blog, so the blog itself is a kind of a study diary, and reflects the way I would want it to be presented to myself when I studied the subject.