# Monge Problem

The optimal transport problem first came up in the 18th century, brought by Gaspard Monge. The original problem is called the Monge Problem, which wants to find an optimal way to rearrange the dirt dig out from the land into castle walls or other desired shapes. The "optimal way" means the way with the minimal "cost", or workload. In the following, we will first formulate the Monge problem, then explain why it is a challenging problem. On top of that, we introduce the Kantorovich's problem which is a generalized version of Monge problem but it perfectly tackles the challenges. Finally, we introduce Brenier's theorem and discuss the solvability of these two problems.

## Formulation of Monge Problem

To begin with, we use probability measures to represent the piles of dirt. Suppose ${\displaystyle (X,d)}$ is a metric space that is equipped with a natural topology generated by all the open balls. The topology defines the collection of open sets, and we denote ${\displaystyle {\mathcal {B}}(X)}$ as the ${\displaystyle \sigma }$-algebra generated by the open sets, which is called the Borel ${\displaystyle \sigma }$-algebra on the metric space ${\displaystyle (X,d)}$. Denote ${\displaystyle {\mathcal {M}}(X)}$ as the set of all finite measures(a pile has finite amount of dirt!) defined on the measurable space ${\displaystyle (X,{\mathcal {B}}(X))}$. For a measure ${\displaystyle \mu \in {\mathcal {M}}(X)}$ and ${\displaystyle B\in {\mathcal {B}}(X)}$, ${\displaystyle \mu (B)}$ represents how much dirt is contained in the set ${\displaystyle B}$, so a measure can precisely characterize the shape of a dirt pile. When we rearrange the dirt pile, we keep the total amount unchanged, so without losing of generality we can assume the total mass ${\displaystyle \mu (X)=1}$, meaning we restrict our attention to the set of all probability measures on ${\displaystyle (X,{\mathcal {B}}(X))}$, denoted by ${\displaystyle {\mathcal {P}}(X)}$.

The next step is to define the behavior of "transport". A possible way is to use the following notion of transport map.

Definition 1. Transport map

Give two probability measures ${\displaystyle \mu ,\nu \in {\mathcal {P}}(X)}$, a measurable function ${\displaystyle t:X\rightarrow X}$ transports ${\displaystyle \mu }$ onto ${\displaystyle \nu }$ if

${\displaystyle \nu (B)=\mu (t^{-1}(B)),\ \forall \ B\in {\mathcal {B}}(X)}$

where ${\displaystyle t^{-1}=\{x\in X:t(x)\in B\}}$ is the preimage. We call ${\displaystyle \nu }$ the push-forward measure of ${\displaystyle \mu }$ under ${\displaystyle t}$, denoted by ${\displaystyle \nu =t_{\#}\mu }$ or ${\displaystyle \nu =\mu \circ t^{-1}}$.

The intuition of this definition is that a transport map ${\displaystyle t}$ moves all of the dirt at ${\displaystyle x\in X}$ to the location ${\displaystyle t(x)}$. Now we can formulate the Monge problem mathematically.

Definition 2. Monge's Problem

Given ${\displaystyle \mu ,\nu \in {\mathcal {P}}(X)}$, Monge problem is the following optimization problem.

${\displaystyle \inf _{t}\int _{X}|t(x)-x|d\mu (x),\ s.t.\ t:X\rightarrow X\ {\text{is measurable and }}t_{\#}\mu =\nu }$

To illustrate, ${\displaystyle |t(x)-x|}$ is the distance of moving the dirt from ${\displaystyle x}$ to ${\displaystyle t(x)}$, and the integral ${\displaystyle \int _{X}|t(x)-x|d\mu (x)}$ is the total cost or effort. The condition ${\displaystyle t_{\#}\mu =\nu }$ ensures that ${\displaystyle t}$ is a transport map from ${\displaystyle \mu }$ onto ${\displaystyle \nu }$, meaning ${\displaystyle t}$ rearranges the dirt in the shape of ${\displaystyle \mu }$ to look like that of ${\displaystyle \nu }$. If ${\displaystyle t}$ solves the Monge problem, meaning it realized the infimum, then we call it an optimal transport map. Here, we use the cost function ${\displaystyle c(x^{1},x^{2})=|x^{1}-x^{2}|}$, but it can also be replaced by other general cost functions. The problem can easily be generalized to the one regarding two different spaces ${\displaystyle X,Y}$, with ${\displaystyle \mu \in {\mathcal {P}}(X)}$, ${\displaystyle \nu \in {\mathcal {P}}(Y)}$ and ${\displaystyle c:X\times Y\rightarrow [0,\infty )}$ is the cost function, but it is essentially the same as the one regarding the same space.

The Monge problem is important because it provides us with spatial insight of how far two probability measures are. If two probability measures are "closed to each other", there should be some transport map such that the total effort of transport is small. This idea leads to the concept of Wasserstein metric.

## Challenges of Monge Problem

The Monge problem turns out to be a very challenging problem for several reasons as follows.

• The constraint set can be empty. Given ${\displaystyle \mu ,\nu \in {\mathcal {P}}(X)}$, it does not necessarily exist a transport map from ${\displaystyle \mu }$ onto ${\displaystyle \nu }$: ${\displaystyle t_{\#}\mu =\nu }$. For example, consider ${\displaystyle \mu =\delta _{0}}$, the point mass at ${\displaystyle 0}$, and ${\displaystyle \nu }$ is the uniform measure on ${\displaystyle [0,1]}$. For every measurable function ${\displaystyle t:X\rightarrow X}$, ${\displaystyle t_{\#}\mu }$ should be a point mass at ${\displaystyle t(0)}$ so ${\displaystyle t}$ cannot yield a uniform distribution on ${\displaystyle [0,1]}$. The reason is that a transport map cannot split the mass concentrated at one point${\displaystyle ---}$it transports all the mass at point ${\displaystyle x}$ to the location ${\displaystyle t(x)}$.
• Solutions may not be unique. Let ${\displaystyle \mu }$, ${\displaystyle \nu }$ be the uniform distribution on ${\displaystyle (0,1)}$ and ${\displaystyle (1/4,5/4)}$ respectively. Consider the following two transport maps.

(1) Transport all the mass on ${\displaystyle (0,1/4)}$ to ${\displaystyle (1,5/4)}$ and keep the rest unchanged. In this case,

${\displaystyle t_{0}(x)={\begin{cases}x+1,&x\in (0,1/4);\\x,&x\in (1/4,1);\end{cases}}}$

is the transport map and the cost is ${\displaystyle \int |t_{0}(x)-x|d\mu (x)=\int _{0}^{1/4}1dx=1/4}$.

(2) Shift all the mass to the right by ${\displaystyle 1/4}$. In this case, ${\displaystyle t_{1}(x)=x+1/4}$ is the transport map and the cost is ${\displaystyle \int |t_{1}(x)-x|d\mu (x)=\int _{0}^{1}1/4dx=1/4}$.

It is not hard to imagine that other methods of transport will cause larger effort, so this problem has two different optimal solutions.

• The constraint set is not convex. In general, for an optimization problem, we usually take an initial guess of the solution and perturb it to see if the objective function decreases. The convexity of constraint set guarantees that the point after perturbation is still in the constraint set. However, this is not always true for Monge problem. Consider the transport maps ${\displaystyle t_{0}}$ and ${\displaystyle t_{1}}$ defined in the last example. Their convex combination is
${\displaystyle t_{\alpha }(x)=(1-\alpha )t_{0}(x)+\alpha t_{1}(x)={\begin{cases}(1-\alpha )(x+1/4)+\alpha (x+1),&x\in (0,1/4);\\(1-\alpha )(x+1/4)+\alpha x,&x\in (1/4,1);\end{cases}}}$

Take ${\displaystyle \alpha =1/2}$, we have

${\displaystyle t_{1/2}(x){\begin{cases}x+5/8,&x\in (0,1/4);\\x+1/8,&x\in (1/4,1);\end{cases}}}$

Then ${\displaystyle \nu ((5/8,3/4))=1/8}$, but ${\displaystyle (t_{{1/2}\#}\mu )((5/8,3/4))=\mu ((0,1/8)\cup (1/2,5/8))=1/4}$, so ${\displaystyle t_{1/2}}$ does not give a transport map from ${\displaystyle \mu }$ onto ${\displaystyle \nu }$ any more.

## A generalized problem: Kantorovich's Problem

An approach to tackle the problems mentioned in Section II is to use a "transport plan" instead of a transport map. This gives a more generalized version of the problem called Kantorovich Problem.

Definition 3. Transport plan

Given ${\displaystyle \mu ,\nu \in {\mathcal {P}}(X)}$, a transport plan is a probability measure ${\displaystyle \gamma \in {\mathcal {P}}(X\times X)}$ on the product space ${\displaystyle (X\times X,{\mathcal {B}}(X\times X))}$ such that ${\displaystyle (\pi ^{1})_{\#}\gamma =\mu }$, ${\displaystyle (\pi ^{2})_{\#}\gamma =\nu }$, where ${\displaystyle \pi ^{i}:X\times X\rightarrow X}$, ${\displaystyle \pi ^{i}(x_{1},x_{2})=x_{i}}$, ${\displaystyle i=1,2}$ are projection maps. The set of all transport plans from ${\displaystyle \mu }$ to ${\displaystyle \nu }$ is denoted by ${\displaystyle \Gamma (\mu ,\nu )}$.

The conditions ${\displaystyle (\pi ^{1})_{\#}\gamma =\mu }$, ${\displaystyle (\pi ^{2})_{\#}\gamma =\nu }$ give the right marginals:

${\displaystyle \mu (A)=((\pi ^{1})_{\#}\gamma )(A)=\gamma ((\pi ^{1})^{-1}(A))=\gamma (A\times X),\ \nu (B)=((\pi ^{2})_{\#}\gamma )(A)=\gamma ((\pi ^{2})^{-1}(A))=\gamma (X\times B)}$

Therefore, a transport plan is a probability measure on the product space ${\displaystyle X\times X}$ that has marginals distributions ${\displaystyle \mu }$ and ${\displaystyle \nu }$. In addition, ${\displaystyle \gamma (A\times B)}$ represents the amount of mass taken from set ${\displaystyle A}$ that is sent to set ${\displaystyle B}$. Since ${\displaystyle \gamma (A\times B)\leq \gamma (A\times X)=\mu (A)}$, a transport plan can move only a fraction but not necessarily all of the mass at some place to other locations, and this solves the problem that a transport map cannot split a point mass. Moreover, any transport map can be represented as a transport plan: if ${\displaystyle t}$ is a transport map with ${\displaystyle t_{\#}\mu =\nu }$, then ${\displaystyle \gamma \equiv (id\times t)_{\#}\mu }$ is a transport plan in ${\displaystyle \Gamma (\mu ,\nu )}$, where ${\displaystyle id}$ is the identity map.

Based on the notion of transport plans, we can state Kantorovich's problem, which is a problem more general than Monge problem. Instead of optimizing the cost over all transport maps, it minimizes the cost over all transport plans.

Definition 4. Kantorovich's problem

Given ${\displaystyle \mu ,\nu \in {\mathcal {P}}(X)}$, the following optimization problem is the Kantorovich's problem:

${\displaystyle \inf _{\gamma :\gamma \in \Gamma (\mu ,\nu )}\int _{X\times X}|x^{1}-x^{2}|^{p}d\gamma (x^{1},x^{2}),\ p\geq 1}$

Denote the cost by ${\displaystyle \mathbb {K} _{p}(\gamma )\equiv \int _{X\times X}|x^{1}-x^{2}|d\gamma (x^{1},x^{2})}$, which is a functional on ${\displaystyle \Gamma (\mu ,\nu )}$. If ${\displaystyle \gamma }$ attains the infimum, we call it an optimal transport plan. Here, we use the cost function ${\displaystyle c(x^{1},x^{2})=|x^{1}-x^{2}|^{p}}$, but it can also be replaced by other general cost functions.

The Kantorovich's problem is a better-behaved problem for the following reasons.

• The constraint sets is always nonempty. For ${\displaystyle \mu ,\nu \in {\mathcal {P}}(X)}$, the product measure ${\displaystyle \mu \times \nu }$ is always a transport plan in ${\displaystyle \Gamma (\mu ,\nu )}$.
• The constraint set is convex. Take ${\displaystyle \gamma _{0}}$, ${\displaystyle \gamma _{1}\in \Gamma (\mu ,\nu )}$, the convex combination is ${\displaystyle \gamma _{\alpha }\equiv (1-\alpha )\gamma _{0}+\alpha \gamma _{1},\ \alpha \in [0,1]}$. Notice that it gives the right marginals: ${\displaystyle (\pi _{\#}^{1}\gamma _{\alpha })(A)=\gamma _{\alpha }(A\times X)=(1-\alpha )\gamma _{0}(A\times X)+\alpha \gamma _{1}(A\times X)=(1-\alpha )\mu (A)+\alpha \mu (A)=\mu (A)}$, and ${\displaystyle (\pi _{\#}^{2}\gamma _{\alpha })(B)=\nu (B)}$ holds similarly.
• The objective function is convex as well. Notice that ${\displaystyle \mathbb {K} _{p}(\gamma _{\alpha })=(1-\alpha )\mathbb {K} _{p}(\gamma _{0})+\alpha \mathbb {K} _{p}(\gamma _{1})}$, so objective function is actually linear.
• Kantorovich's problem has a dual problem.
• The minimizer exists via the direct method of the calculus of variations.

## Brenier's Theorem, Solvability of the problem

In this section, we consider a special case where ${\displaystyle X=\mathbb {R} ^{d}}$ and the cost function is quadratic: ${\displaystyle c(x^{1},x^{2})=|x^{1}-x^{2}|^{2}}$. These assumptions allow us to obtain quite strong results for the solvability of Monge and Kantorovich's problems, which is known as the following Brenier's theorem.

Theorem 1. Brenier's theorem

Given ${\displaystyle \mu ,\nu \in {\mathcal {P}}_{2}(\mathbb {R} ^{d})}$, ${\displaystyle \mu \ll {\mathcal {L}}^{d}}$, where ${\displaystyle {\mathcal {P}}_{2}(\mathbb {R} ^{d})\equiv \{\mu \in {\mathcal {P}}(\mathbb {R} ^{d}):\int |x|^{2}d\mu (x)<\infty \}}$ and ${\displaystyle {\mathcal {L}}^{d}}$ is the Lebesgue measure on ${\displaystyle \mathbb {R} ^{d}}$. Then there exists a unique optimal transport plan ${\displaystyle \gamma _{\star }}$ of the form ${\displaystyle \gamma _{\star }=(id\times t_{\star })_{\#}\mu }$ for the Kantorovich's problem, where ${\displaystyle t_{\star }:\mathbb {R} ^{d}\rightarrow \mathbb {R} ^{d}}$ is a measurable function. Notice that ${\displaystyle t_{\star }}$ is actually a transport map from ${\displaystyle \mu }$ onto ${\displaystyle \nu }$, so ${\displaystyle t_{\star }}$ solves the corresponding Monge problem. In particular, the optimal cost of two problems coincide:

${\displaystyle \inf _{t:t_{\#}\mu =\nu }\mathbb {M} _{2}(t)=\inf _{\gamma \in \Gamma (\mu ,\nu )}\mathbb {K} _{2}(\gamma )}$

where ${\displaystyle \mathbb {M} _{2}(t)\equiv \int |t(x)-x|^{2}d\mu (x)}$.

The proof of the result relies heavily on the duality property of the Kantorovich's problem. By passing to the dual problem, we are able to give the expression of ${\displaystyle t}$ as the gradient of the Kantorovich potential[1]. The assumption that ${\displaystyle \mu }$ is absolute continuous w.r.t Lebesgue measure can be weakened to not giving mass on sets of ${\displaystyle d-1}$ dimensional Hausdorff measure [2], and the result still holds for cost functions in the form of ${\displaystyle c(x^{1},x^{2})=h(x^{1}-x^{2})}$ for some convex function ${\displaystyle h}$.