The optimal transport problem first came up in the 18th century, brought by Gaspard Monge. The original problem is called the Monge Problem, which wants to find an optimal way to rearrange the dirt dig out from the land into castle walls or other desired shapes. The "optimal way" means the way with the minimal "cost", or workload. In the following, we will first formulate the Monge problem, then explain why it is a challenging problem. On top of that, we introduce the Kantorovich's problem which is a generalized version of Monge problem but it perfectly tackles the challenges. Finally, we introduce Brenier's theorem and discuss the solvability of these two problems.
Formulation of Monge Problem
To begin with, we use probability measures to represent the piles of dirt. Suppose is a metric space that is equipped with a natural topology generated by all the open balls. The topology defines the collection of open sets, and we denote as the -algebra generated by the open sets, which is called the Borel -algebra on the metric space . Denote as the set of all finite measures(a pile has finite amount of dirt!) defined on the measurable space . For a measure and , represents how much dirt is contained in the set , so a measure can precisely characterize the shape of a dirt pile. When we rearrange the dirt pile, we keep the total amount unchanged, so without losing of generality we can assume the total mass , meaning we restrict our attention to the set of all probability measures on , denoted by .
The next step is to define the behavior of "transport". A possible way is to use the following notion of transport map.
Definition 1. Transport map
Give two probability measures , a measurable function transports onto if
where is the preimage. We call the push-forward measure of under , denoted by or .
The intuition of this definition is that a transport map moves all of the dirt at to the location . Now we can formulate the Monge problem mathematically.
Definition 2. Monge's Problem
Given , Monge problem is the following optimization problem.
To illustrate, is the distance of moving the dirt from to , and the integral is the total cost or effort. The condition ensures that is a transport map from onto , meaning rearranges the dirt in the shape of to look like that of . If solves the Monge problem, meaning it realized the infimum, then we call it an optimal transport map. Here, we use the cost function , but it can also be replaced by other general cost functions. The problem can easily be generalized to the one regarding two different spaces , with , and is the cost function, but it is essentially the same as the one regarding the same space.
The Monge problem is important because it provides us with spatial insight of how far two probability measures are. If two probability measures are "closed to each other", there should be some transport map such that the total effort of transport is small. This idea leads to the concept of Wasserstein metric.
Challenges of Monge Problem
The Monge problem turns out to be a very challenging problem for several reasons as follows.
- The constraint set can be empty. Given , it does not necessarily exist a transport map from onto : . For example, consider , the point mass at , and is the uniform measure on . For every measurable function , should be a point mass at so cannot yield a uniform distribution on . The reason is that a transport map cannot split the mass concentrated at one pointit transports all the mass at point to the location .
- Solutions may not be unique. Let , be the uniform distribution on and respectively. Consider the following two transport maps.
(1) Transport all the mass on to and keep the rest unchanged. In this case,
is the transport map and the cost is .
(2) Shift all the mass to the right by . In this case, is the transport map and the cost is .
It is not hard to imagine that other methods of transport will cause larger effort, so this problem has two different optimal solutions.
- The constraint set is not convex. In general, for an optimization problem, we usually take an initial guess of the solution and perturb it to see if the objective function decreases. The convexity of constraint set guarantees that the point after perturbation is still in the constraint set. However, this is not always true for Monge problem. Consider the transport maps and defined in the last example. Their convex combination is
Take , we have
Then , but , so does not give a transport map from onto any more.
A generalized problem: Kantorovich's Problem
An approach to tackle the problems mentioned in Section II is to use a "transport plan" instead of a transport map. This gives a more generalized version of the problem called Kantorovich Problem.
Definition 3. Transport plan
Given , a transport plan is a probability measure on the product space such that , , where , , are projection maps. The set of all transport plans from to is denoted by .
The conditions , give the right marginals:
Therefore, a transport plan is a probability measure on the product space that has marginals distributions and . In addition, represents the amount of mass taken from set that is sent to set . Since , a transport plan can move only a fraction but not necessarily all of the mass at some place to other locations, and this solves the problem that a transport map cannot split a point mass. Moreover, any transport map can be represented as a transport plan: if is a transport map with , then is a transport plan in , where is the identity map.
Based on the notion of transport plans, we can state Kantorovich's problem, which is a problem more general than Monge problem. Instead of optimizing the cost over all transport maps, it minimizes the cost over all transport plans.
Definition 4. Kantorovich's problem
Given , the following optimization problem is the Kantorovich's problem:
Denote the cost by , which is a functional on . If attains the infimum, we call it an optimal transport plan. Here, we use the cost function , but it can also be replaced by other general cost functions.
The Kantorovich's problem is a better-behaved problem for the following reasons.
- The constraint sets is always nonempty. For , the product measure is always a transport plan in .
- The constraint set is convex. Take , , the convex combination is . Notice that it gives the right marginals: , and holds similarly.
- The objective function is convex as well. Notice that , so objective function is actually linear.
- Kantorovich's problem has a dual problem.
- The minimizer exists via the direct method of the calculus of variations.
Brenier's Theorem, Solvability of the problem
In this section, we consider a special case where and the cost function is quadratic: . These assumptions allow us to obtain quite strong results for the solvability of Monge and Kantorovich's problems, which is known as the following Brenier's theorem.
Theorem 1. Brenier's theorem
Given , , where and is the Lebesgue measure on . Then there exists a unique optimal transport plan of the form for the Kantorovich's problem, where is a measurable function.
Notice that is actually a transport map from onto , so solves the corresponding Monge problem. In particular, the optimal cost of two problems coincide:
The proof of the result relies heavily on the duality property of the Kantorovich's problem. By passing to the dual problem, we are able to give the expression of as the gradient of the Kantorovich potential. The assumption that is absolute continuous w.r.t Lebesgue measure can be weakened to not giving mass on sets of dimensional Hausdorff measure , and the result still holds for cost functions in the form of for some convex function .