# Discrete Optimal Transport

If the measures ${\displaystyle \mu ,\nu }$ are both finite convex combinations of Dirac masses, the Kantorovich and Monge problems may be stated as linear programs[1], and solved using classical methods[2]. This insight may be used to approximate the Wasserstein distances between general measures, by first discretizing the source and target with a known level of accuracy (a difficult problem!), then computing the cost between the discrete measures.

# Statement of the Discrete Problems

We suppose that ${\displaystyle \mu =\sum _{i=1}^{M}m_{i}\delta _{x_{i}}}$ and ${\displaystyle \nu =\sum _{j=1}^{N}n_{i}\delta _{y_{i}}}$ (assuming the weights ${\displaystyle m_{i}}$ and ${\displaystyle n_{j}}$ are all positive, the ${\displaystyle x_{i}}$ and ${\displaystyle y_{j}}$ are distinct, and ${\displaystyle \sum _{i=1}^{M}m_{i}=1=\sum _{j=1}^{N}n_{j}}$). Then any transport plan ${\displaystyle \gamma =\sum _{i=1}^{M}\sum _{j=1}^{N}g_{ij}\delta _{(x_{i},y_{j})}}$ for scalars ${\displaystyle g_{ij}}$ (if it charged anything other than the product of the supports of ${\displaystyle \mu }$ and ${\displaystyle \nu }$, it would not have the correct marginals). For notation, set ${\displaystyle m=(m_{i})_{i}\in \mathbb {R} ^{M}}$, ${\displaystyle n=(n_{j})_{j}\in \mathbb {R} ^{N}}$, ${\displaystyle G=(g_{ij})_{ij}\in {\mathcal {M}}_{M\times N}}$, ${\displaystyle C=(\lVert x_{i}-y_{j}\rVert ^{2})_{ij}=:(c_{ij})\in {\mathcal {M}}_{M\times N}}$ (or more generally ${\displaystyle C=(c(x_{i},y_{j}))_{ij}}$ for generic cost functions), and denote by ${\displaystyle \mathbf {1} _{d}\in \mathbb {R} ^{d}}$ the vector of all ones of dimension ${\displaystyle d}$. Essentially, we are replacing the space ${\displaystyle X\times X}$ with ${\displaystyle \left\{1,\ldots ,M\right\}\times \left\{1,\ldots ,N\right\}}$ and passing from measures and functions to vectors and matrices.

## Primal Problems

With all this notation, we can say the collection of transport plans is analogous to

${\displaystyle \Gamma (\mu ,\nu )\cong {\mathcal {G}}(m,n):=\left\{G\in {\mathcal {M}}_{M\times N}:g_{ij}\geq 0,G\mathbf {1} _{N}=m,G^{\top }\mathbf {1} _{M}=n\right\},}$
and the analogue of the (original) Kantorovich Problem is
${\displaystyle \inf _{\gamma \in \Gamma (\mu ,\nu )}\int \lVert x-y\rVert ^{2}\,d\gamma (x,y)\cong \inf _{G\in {\mathcal {G}}(m,n)}\sum _{i=1}^{M}\sum _{j=1}^{N}c_{ij}g_{ij}=\inf _{G\in {\mathcal {G}}(m,n)}{\text{tr}}\left(C^{\top }G\right)=\inf _{G\in {\mathcal {G}}(m,n)}C:G}$
(where ${\displaystyle C:G}$ is just shorthand for ${\displaystyle {\text{tr}}\left(C^{\top }G\right)}$, an inner product on matrices).

We could define a transport map as a function ${\displaystyle t:\left\{1,\ldots ,M\right\}\to \left\{1,\ldots ,N\right\}}$, but it will be more convenient to define it as a matrix in ${\displaystyle {\mathcal {M}}_{N\times M}}$, subject to constraints (erasing the difference between a transport map and the transport plan it defines). We can say the collection of transport maps is

${\displaystyle {\mathcal {T}}(m,n):=\left\{T\in {\mathcal {M}}_{M\times N}:t_{ij}\in \left\{0,1\right\},T\mathbf {1} _{N}=m,T^{\top }\mathbf {1} _{M}=n\right\}\subseteq {\mathcal {G}}(m,n).}$
That is, a transport map corresponds to a plan which does not split mass, but instead sends all mass at ${\displaystyle x_{i}}$ to the ${\displaystyle y_{j}}$ such that ${\displaystyle t_{ij}=1}$ (since ${\displaystyle m}$ and ${\displaystyle n}$ correspond to probability measures—that is, ${\displaystyle \mathbf {1} _{M}^{\top }m=1=\mathbf {1} _{N}^{\top }n}$—this is unique for a given ${\displaystyle T}$). Then the analogue of the Monge Problem is
${\displaystyle \inf _{T\in {\mathcal {T}}(m,n)}C:T.}$
As in the general case, this may fail to exist; in fact, ${\displaystyle {\mathcal {T}}(m,n)}$ may be empty, for example if ${\displaystyle M.

## Dual Problem

This also has an analogue.

# Useful Combinatorial Structure

The set ${\displaystyle {\mathcal {G}}(m,n)}$ is a closed polytope in ${\displaystyle {\mathcal {M}}_{M\times N}}$, and since ${\displaystyle G\mapsto C:G}$ (resp. ${\displaystyle T\mapsto C:T}$) is a linear functional bounded below by zero on ${\displaystyle {\mathcal {G}}}$, it is quite classical[3] (perhaps taking a compact set formed by restricting to those plans with ${\displaystyle C:G\leq C:G_{0}}$ for some fixed ${\displaystyle G_{0}\in {\mathcal {G}}(m,n)}$) that its minimum is obtained at an extreme point.

# Algorithms

## Simplex Algorithm

This is also very classical.

## Improved Algorithms

There are other, improved methods with their own articles, like the Auction Algorithm and Sinkhorn's Algorithm. Perhaps short descriptions/groupings of them should be put on this page to collect the information?