Optimization Problems, Certificates and Duality

Section 1.1 Optimization Problems, Certificates and Duality

Subsection 1.1.1 What is Discrete Optimization?

Before answering this question, let us focus on a more fundamental question: “what is optimization?” The following provides a very general definition of an optimization problem.

Definition 1.1.1. Optimization Problem.

An optimization problem asks for the maximum of $f(x)$ over all $x\in X$ where $X$ is a set called the domain of the problem and $f:X\to\mathbb{R}$ is a function called the objective function. In cases where the maximum does not exist—e.g. if $f$ is unbounded on $X$ or $X=\emptyset$—then we are interested in computing the supremum. We use the following shorthand for describing such an optimization problem:

\begin{align*} \text{max } \amp f(x)\\ \text{subject to } \amp x\in X. \end{align*}

We are also interested in analogous minimization problems which are denoted in a similar way but with “min” in the place of “max”.

This formulation should remind you of the “extremal value problems” that you encountered in Calculus. E.g., the problem of maximizing a differentiable function over a given domain. In Calculus, the single-variable case boils down to taking the derivative, finding the critical points, checking the boundaries, and so on; in the multivariate setting, one may need to apply additional tools, such as Lagrange multipliers. Some of the intuition that you gained in Calculus will occasionally be useful in this course.

The main feature that distinguishes a discrete (or combinatorial) optimization problem from a general one is the fact that, in a discrete problem, the domain $X$ is finite. Such problems frequently arise in practical applications, such as developing an exam schedule, assigning tasks to computers, routing traffic through a transportation network or decrypting an encrypted message. Since the domain is finite, in some sense, a discrete optimization problem is trivial to solve; one can simply compute the value of $f(x)$ for each $x\in X\text{,}$ one by one, and output the largest value.

So, if discrete optimization problems are trivial to solve, then what is the point of studying Discrete Optimization? The catch is that the domain of a combinatorial optimization problem, while finite, is often enormous (due to, e.g., combinatorial explosion). When dealing with a huge finite domain, it can become infeasible to simply compute the objective function at all possible inputs. Thus, in order to solve many practical problems, we are incentivized to develop methods for not only finding the solution to a discrete optimization problem, but finding it as quickly as possible. Consider the following example.

Problem 1.1.2. The Sorting Problem.

Input: A list of $n$ English words.
Goal: List these words in alphabetical order.

One can view Problem 1.1.2 as a discrete optimization problem where the domain $X$ consists of all $n!$ orderings of the words and the objective function $f$ computes the number of pairs of words that are in the wrong order. It may seem theoretically possible to solve this problem by computing $f(x)$ for every $x\in X$ , one by one, and outputting the ordering $x$ which achieves $f(x)=0\text{.}$ However, this approach could take as long as $n!$ steps (or more) which, if $n\geq 65\text{,}$ would be far more than the number of atoms in the observable universe! The Oxford English Dictionary contains more than 600,000 words. Even the world’s most powerful supercomputer would be unable to order the words of a dictionary of this size by following the “try every ordering” approach. For this problem, there are many simple algorithms that take a maximum of about $n^2$ steps (e.g. InsertionSort, Example 1.2.1), and a few more sophisticated ones which take about $n\log(n)$ steps (e.g. MergeSort, Example 1.2.3), which is far superior to a running time of $n!\text{.}$

To summarize, the focus of discrete optimization is on developing efficient (i.e. fast) algorithms to solve discrete optimization problems. Of course, there is no hope to be able to do this in full generality. If $f$ is an arbitrary objective function on a finite domain $X\text{,}$ then there is no way to determine the maximum of $f(x)$ for $x\in X$ without computing the function at every point. A key idea is that, in many practical applications, the domain $X$ has special structure that one can exploit to locate the optimum more quickly. At the same time, some discrete optimization problems turn out to be highly resistant to our attempts to solve them via an efficient algorithm. In these cases, it is valuable to rigorously prove that such problems are intractable (meaning that an efficient algorithm does not exist, or is highly unlikely to exist). Let us look at an illustrative example.

Subsection 1.1.2 A Sample Problem

Imagine that you are the manager of a company with $n$ workers and $m\geq n$ tasks to be completed. Each worker can only do one task per day and each task requires only one worker to complete it. Moreover, each worker is qualified to do some of the tasks but not necessarily all of them (due to, e.g., special training required for certain tasks).

Problem 1.1.3. An Assignment Problem.

How can we efficiently find an assignment of workers to tasks to maximize the number of tasks that are completed in a given day?

Of course, this optimization problem has a finite domain and so it is easy to find the optimum if one has unlimited time and resources. However, as in Problem 1.1.2, solving the problem via an exhaustive search through all possibilities will not be practical. For example, suppose that we simply go through every possible assignment of the $n$ workers to $n$ of the tasks and choose the assignment that maximizes the number of workers who are qualified for the task that they are assigned to. The number of steps in this algorithm is at least as large as the number of injective functions from a set of size $n$ to a set of size $m\text{,}$ which is

\begin{equation*} m(m-1)\cdots(m-n+1) = \frac{m!}{(m-n)!}. \end{equation*}

The function $\frac{m!}{(m-n)!}$ grows extremely quickly as both $m$ and $n$ grow. Therefore, “brute-force” algorithm is extremely slow and could not be used in practice. In this course, we will see several approaches for solving problems of this type more quickly; in particular, an efficient algorithm for solving Problem 1.1.3 is presented in Section 3.3.

Our first step in approaching this problem is to “abstract it” beyond the context of workers and tasks to a purely mathematical question. For this, we will need the notion of a graph and, in particular, a bipartite graph.

Definition 1.1.4. Graph.

A graph $G$ is a pair $(V,E)$ where $V$ is a set of vertices and $E$ is a set of unordered pairs of distinct vertices called edges. We write $V(G)$ and $E(G)$ to denote the sets of vertices and edges, respectively, of a graph $G\text{.}$

An edge of a graph is usually expressed by concatenating the two vertices within it; i.e. $uv\in E(G)$ means that the unordered pair $\{u,v\}$ of vertices is an edge of $G\text{.}$ When $uv\in E(G)\text{,}$ we say that $u$ and $v$ are adjacent or that they are neighbours. An edge $e\in E(G)$ is said to be incident to a vertex $v\in V(G)$ if $v$ is one of the two vertices in $e\text{.}$

Definition 1.1.5. Bipartite.

A graph $G$ is said to be bipartite if $V(G)$ can be partitioned into two sets, say $A$ and $B\text{,}$ in such a way that every edge of $G$ has one endpoint in $A$ and one in $B\text{.}$ The pair $(A,B)$ is called a bipartition of $G\text{.}$

Coming back to Problem 1.1.3, let $A$ be the set of $n$ workers and $B$ be the set of tasks and let $G$ be a graph obtained by adding an edge from a vertex $w\in A$ to a vertex $t\in B$ if worker $w$ is qualified for task $t\text{.}$ The problem stated above is now seen to be equivalent to finding the maximum cardinality of a subset of $E(G)$ with the property that no vertex is contained in more than one edge of this set; this is known as a matching.

Definition 1.1.6. Matching.

A matching in a graph $G$ is a subset $M\subseteq E(G)$ with the property that every vertex $v\in V(G)$ is incident to (i.e. contained in) at most one edge in $M\text{.}$

Definition 1.1.7. Matching Number.

The matching number of a graph $G\text{,}$ denoted $\nu(G)\text{,}$ is the largest cardinality of a matching in $G\text{.}$

The problem of assigning workers to tasks can now be seen as being equivalent to finding the matching number of a bipartite graph. For a general set $A\text{,}$ we let $2^A=\{S: S\subseteq A\}$ denote the collection of all subsets of $A\text{.}$ We can phrase our problem as a discrete optimization problem by letting $X=2^{E(G)}$ and $f:X\to\mathbb{R}$ be defined so that, for each $S\in X\text{,}$ we have $f(S)=0$ if $S$ is not a matching and $f(S)=|S|$ otherwise. Our goal is to compute the maximum of $f(S)$ over all $S\in X\text{.}$ This leads us to another (inefficient) “brute force” algorithm of computing $f(S)$ for every $S\in X$ by checking all $2^{|E(G)|}$ possible elements of $X\text{.}$ Thus, if $|E(G)|\geq 60\text{,}$ then the number of steps in this computation would exceed the number of atoms in the universe!

As it turns out, there are much more clever algorithms for computing the matching number of a bipartite graph; in fact, the matching number of any graph can be found in a number of steps that is bounded by a polynomial function of the number of vertices and edges of the graph. This is far superior to the brute force approaches described above which each require exponentially many steps. Later, in Chapter 6, we will see two approaches for solving matching problems efficiently: one involving an application of linear programming and another more combinatorial approach. Throughout the course, we will encounter many other discrete optimization problems with interesting practical applications and the methods that are used to solve these problems efficiently (or evidence that no efficient method exists!).

Subsection 1.1.3 Certificates and Duality

The problem of showing that the maximum of a given objective function $f$ on a domain $X$ is equal to a particular value, say $k\text{,}$ can be thought of in terms of two subproblems:

Subproblem 1: Show that the maximum of $f(x)$ for $x\in X$ is at least $k\text{.}$

Subproblem 2: Show that the maximum of $f(x)$ for $x\in X$ is at most $k\text{.}$

Usually, the simplest way to show that the maximum is at least $k$ is by simply finding some $x\in X$ such that $f(x)\geq k\text{.}$ Such an $x$ certifies that the maximum is at least $k$ and, for this reason, we say that $x$ is a certificate of this inequality. However, strictly speaking, proving a lower bound on a maximization problem does not necessarily require one to find a point in the domain with a high objective value. A certificate for a lower bound on a maximization problem can be any argument which implies that such an element of the domain exists, even if one does not explicitly exhibit such an element.

In contrast, certifying an upper bound for a maximization problem seems to be more difficult. For example, if one wants to show that the matching number of a graph is at most $k\text{,}$ then one needs to rule out the existence of any matching of cardinality $k+1\text{.}$ We again encounter the problem that checking all subsets of $E(G)$ of cardinality $k+1$ quickly becomes impractical for certain values of $k$ and $|E(G)|\text{.}$ It would preferable to be able to identify a simple and easy-to-check reason why the graph cannot possibly have a matching of cardinality $k+1\text{.}$ For many discrete optimization problems, this boils down to finding a simple “bottleneck” which prevents the objective function from exceeding a certain value. The next example provides a natural bottleneck for the matching problem.

Definition 1.1.8. Vertex Cover.

A vertex cover in a graph $G$ is a subset $C\subseteq V(G)$ with the property that every edge of $G$ is incident to at least one vertex in $C\text{.}$

Definition 1.1.9. Vertex Cover Number.

The vertex cover number of a graph $G\text{,}$ denoted $\tau(G)\text{,}$ is the smallest cardinality of a vertex cover in $G\text{.}$

Lemma 1.1.10.

Every graph $G$ satisfies $\nu(G)\leq \tau(G)\text{.}$

Proof.

Let $M$ be a matching of maximum cardinality and $C$ be a vertex cover of minimum cardinality in a graph $G\text{.}$ For each $e\in M\text{,}$ let $f(e)$ denote the number of vertices of $C$ that are incident to $M\text{.}$ Since $C$ is a vertex cover, we must have $f(e)\geq1$ for every $e\in M\text{.}$ Similarly, for each $v\in C\text{,}$ let $g(v)$ be the number of edges in $M$ that are incident with $v\text{.}$ Then, since $M$ is a matching, we have $g(v)\leq 1$ for all $v\in C\text{.}$ Combining these facts with a little double counting gives us

\begin{equation*} |M| \leq \sum_{e\in M}f(e) = \sum_{v\in C}g(v) \leq |C|. \end{equation*}

Therefore, we conclude that $\nu(G)\leq \tau(G)\text{.}$

Therefore, to certify that $\nu(G)\leq k\text{,}$ it suffices to find a vertex cover of cardinality $k$ in $G\text{.}$ That is, a small vertex cover provides us with a simple bottleneck which implies that the graph does not have a large matching. Unfortunately, the vertex cover number of a graph is not always equal to the matching number and so such a certificate does not always exist. For example, for the 3-vertex complete graph $K_3\text{,}$ the matching number is 1 while the vertex cover number is 2. However, in the case that $G$ is bipartite, the matching number is equal to the vertex cover number and so, in this setting, a small vertex cover is the only possible bottleneck which can prevent a large matching.

Theorem 1.1.11. Kőnig’s Theorem.

Every bipartite graph $G$ satisfies $\nu(G)= \tau(G)\text{.}$

We postpone proving Kőnig’s Theorem until later in these notes, but note that the standard combinatorial proof is quite simple and beautiful.

Kőnig’s Theorem provides us with our first example of the important idea of “duality” in optimization. The problem of finding a minimal vertex cover of a bipartite graph can be thought of as the dual of the problem of finding the maximum size of a matching. In general, the dual of a maximization problem is typically a minimization problem with the property that any solution to the dual problem provides us with an upper bound on the optimal value of the original primal problem. This is called weak duality. In cases where the optimum value for the dual problem is always equal to that of the primal problem, as in the case of e.g. bipartite matching problems, we use the term strong duality.

The concept of duality is incredibly valuable. For one, it allows us to look at a problem from two different perspectives; e.g., in the case of a bipartite matching problem, we can decide to focus on matchings or on vertex covers, whichever is more convenient. Strong duality may, in fact, be a core principle which distinguishes the problems which can be solved via efficient algorithms and those that cannot. Understanding the boundary between problems that can or cannot be solved efficiently is among the most important problems in mathematics and theoretical computer science. In fact, solving it will earn you $1 million from the Clay Mathematics Institute (see the P vs. NP problem)!

You have undoubtedly seen the concept of duality before during your mathematical studies, but may not have thought about it in full generality. Here are a few examples of duality theorems that may spark your memory with a reference to a course at UVic in which such a theorem often appears. We are only including these as examples of things that many of you have seen before, and so we omit the necessary definitions. Some of these theorems will come up again later in the notes. Do not worry if you have not seen all of these theorems before.

Theorem 1.1.12. Dilworth’s Theorem, Math 322.

If $(P,\preceq)$ is a finite poset, then the size of the maximum antichain in $P$ is equal to the minimum number of chains that $P$ can be partitioned into.

Theorem 1.1.13. Mirsky’s Theorem, Math 322.

If $(P,\preceq)$ is a finite poset, then the size of the maximum chain in $P$ is equal to the minimum number of antichains that $P$ can be partitioned into.

Theorem 1.1.14. Menger’s Theorem (Vertex Version), Math 320.

If $u,v$ are distinct vertices in a graph $G\text{,}$ then there exist $k$ internally vertex-disjoint paths in $G$ from $u$ to $v$ if and only if there does not exist a vertex cut for $u$ and $v$ of cardinality less than $k\text{.}$

Theorem 1.1.15. Menger’s Theorem (Edge Version), Math 320.

If $u,v$ are distinct vertices in a graph $G\text{,}$ then there exist $k$ edge-disjoint paths in $G$ from $u$ to $v$ if and only if there does not exist an edge cut for $u$ and $v$ of cardinality less than $k\text{.}$

Definition 1.1.16. Directed Graph.

A directed graph or a digraph $D$ is a pair $(V,A)$ where $V$ is a set of vertices and $A$ is a set of ordered pairs of vertices called arcs. We write $V(D)$ and $A(D)$ to denote the sets of vertices and arcs, respectively, of a digraph $D\text{.}$

As with graphs, we write arcs of a digraph by concatenating vertices; i.e. $uv\in A(D)$ means that the ordered pair $(u,v)\in V\times V$ is an arc of $D\text{.}$ An arc of the form $(v,v)$ is called a loop. Every graph $G$ can be represented by a digraph with the same vertex set as $G$ containing both of the arcs $(u,v)$ and $(v,u)$ for all $uv\in E(G)\text{.}$ An arc of the form $uv$ is said to be an outgoing arc of $u$ and an incoming arc of $v$ and it is said to be incident to $u$ and $v\text{.}$ Also, when $uv\in A(D)\text{,}$ we say that $v$ is an out-neighbour of $u$ and that $u$ is an in-neighbour of $v\text{.}$

Theorem 1.1.17. Max-Flow Min-Cut Theorem, Math 320.

For any digraph $D=(V,A)\text{,}$ capacity function $c:A\to[0,\infty)\text{,}$ source vertex $s$ and sink vertex $t\text{,}$ the maximum value of a $c$-bounded flow from $s$ to $t$ is equal to the minimum capacity over all $s$-$t$ cuts.

Here are a couple of a YouTube videos related to the topics in this section:

Prev Top Next