# Sliced Wasserstein Distance

The sliced Wasserstein distance ${\displaystyle SW_{2}}$ is an alternative distance between probability measures which enjoys many of the same properties as the Wasserstein distance. For further reading see Santambrogio (pg. 214-215) [1] and Peyré & Cuturi (pg. 166-169)[2].

## Motivation

One situation in which the Wasserstein distance is easier to compute is the 1D case. In particular, if the the measures are of the form ${\displaystyle \alpha ={\tfrac {1}{n}}\textstyle \sum _{i=1}^{n}\delta _{x_{i}}}$ and ${\displaystyle \beta ={\tfrac {1}{n}}\textstyle \sum _{i=1}^{n}\delta _{y_{i}}}$ where ${\displaystyle x_{1}\leq \ldots \leq x_{n}}$ and ${\displaystyle y_{1}\leq \ldots \leq y_{n}}$ then the Wasserstein distance is given by ${\displaystyle W_{p}(\alpha ,\beta )^{p}={\tfrac {1}{n}}\textstyle \sum _{i=1}^{n}|x_{i}-y_{i}|^{p}}$ (Peyré & Cuturi pg. 30 [2]). The simplicity of the 1D case provokes one to consider whether a Wasserstein-like distance over ${\displaystyle \mathbb {R} ^{d}}$ could be built from knowledge of the Wasserstein distance along projections onto 1D axes. The sliced Wasserstein distance provides an affirmative answer.

## Definition

Let ${\displaystyle P_{\theta }:\mathbb {R} ^{d}\to \mathbb {R} }$ be the projection onto a unit vector ${\displaystyle \theta \in \mathbb {S} ^{d-1}}$ i.e. ${\displaystyle P_{\theta }(x)=x\cdot \theta }$. The sliced Wasserstein distance ${\displaystyle SW_{2}}$ on ${\displaystyle {\mathcal {P}}_{2}(\mathbb {R} ^{d})}$ is given by

${\displaystyle SW_{2}(\mu ,\nu )=\left(\int _{\mathbb {S} ^{d-1}}W_{2}(P_{\theta \#}\mu ,P_{\theta \#}\nu )^{2}d\theta \right)^{1/2}}$

Here the integral over ${\displaystyle \theta }$ is with respect to the surface measure on ${\displaystyle \mathbb {S} ^{d-1}}$.

## Properties

The sliced Wasserstein distance satisfies all the axioms of a true metric on ${\displaystyle {\mathcal {P}}_{2}(\mathbb {R} ^{d})}$. The triangle inequality is inherited from ${\displaystyle W_{2}}$ and ${\displaystyle L^{2}}$, and the positivity and symmetry of ${\displaystyle W_{2}}$ yields the positivity and symmetry of ${\displaystyle SW_{2}}$. The tricky part lies in showing that ${\displaystyle SW_{2}(\mu ,\nu )=0}$ implies ${\displaystyle \mu =\nu }$. Note that if ${\displaystyle SW_{2}(\mu ,\nu )=0}$ then ${\displaystyle P_{\theta \#}\mu =P_{\theta \#}\nu }$. One can go from that observation to the conclusion that ${\displaystyle \mu =\nu }$ by appealing to the theory of Radon transforms.

It turns out that ${\displaystyle W_{2}(P_{\theta \#}\mu ,P_{\theta \#}\nu )\leq W_{2}(\mu ,\nu )}$ (i.e. ${\displaystyle P_{\theta \#}}$ is 1-Lipschitz). This implies that ${\displaystyle SW_{2}(\mu ,\nu )\leq W_{2}(\mu ,\nu )}$ which means that the identity map on ${\displaystyle {\mathcal {P}}_{2}(\mathbb {R} ^{d})}$ is ${\displaystyle W_{2}}$-to-${\displaystyle SW_{2}}$-continuous. Moreover, if we restrict our domain to a compact ${\displaystyle \Omega \subseteq \mathbb {R} ^{d}}$ we have that ${\displaystyle ({\mathcal {P}}_{2}(\Omega ),W_{2})}$ is itself compact and so the identity map is now a continuous bijection from a compact space to a Hausdorff space and so it must be a homeomorphism. This shows that on compact domains ${\displaystyle SW_{2}}$ is just as good as ${\displaystyle W_{2}}$ from a topological standpoint.

## Computation

To estimate the computation involved in ${\displaystyle SW_{2}}$, one can discretize the sphere and carry out the requisite 1D Wasserstein distance computations. As mentioned in the motivation section, 1D Wasserstein distances are significantly simpler to compute. This is especially so in the case of empirical measures of equally sized support. For further details on how to compute ${\displaystyle SW_{2}}$, see Peyré & Cuturi (pg. 166-169)[2].

## References

1. Santambrogio, Filippo. "Optimal Transport for Applied Mathematicians" (2015)
2. Peyré, Gabriel & Cuturi, Marco. "Computational Optimal Transport" (2018)