# The Moreau-Yosida Regularization

The Moreau-Yosida regularization is a technique used to approximate lower semicontinuous functions by Lipschitz functions. An important application of this result is to prove Portmanteau's Theorem, which states that integration against a lower semicontinuous and bounded below function is lower semicontinuous with respect to the narrow convergence in the space of probability measures.

## Definitions

Let ${\displaystyle (X,d)}$ be a metric space, and let ${\displaystyle {\mathcal {P}}(X)}$ denotes the collection of probability measures on ${\displaystyle X}$. ${\displaystyle (X,d)}$ is said to be a Polish space if it is complete and separable.

A function ${\displaystyle g:X\to (-\infty ,+\infty ]}$ is said to be proper [1] if it is not identically equal to ${\displaystyle +\infty }$, that is, if there exists ${\displaystyle x\in X}$ such that ${\displaystyle g(x)<+\infty }$. The domain ${\displaystyle D(g)}$ of ${\displaystyle g}$ is the set

${\displaystyle D(g):=\{x\in X|g(x)<+\infty \}}$.

For a given function ${\displaystyle g:X\to (-\infty ,+\infty ]}$ and ${\displaystyle k\geq 0}$, its Moreau-Yosida regularization [1] ${\displaystyle g_{k}:X\to [-\infty ,+\infty ]}$ is given by

${\displaystyle g_{k}(x):=\inf \limits _{y\in X}\left[g(y)+kd(x,y)\right].}$


The distance term ${\displaystyle d(x,y)}$ may often be raised to a positive exponent ${\displaystyle p}$, in particular ${\displaystyle p=2}$. For example, when ${\displaystyle X}$ is a Hilbert space [2] [3], ${\displaystyle g_{k}}$ is taken to be

${\displaystyle g_{k}(x):=\inf \limits _{y\in X}\left[g(y)+{\frac {k}{2}}\|x-y\|^{2}\right].}$


This particular variant in a Hilbert space setting is explored in more detail below.

The dependence on the parameter ${\displaystyle k}$ may also be written instead as

${\displaystyle \inf \limits _{y\in X}\left[g(y)+{\frac {1}{\tau }}d(x,y)\right]}$

for ${\displaystyle \tau \in (0,+\infty )}$.

Note that

${\displaystyle g_{k}(x)=\inf \limits _{y\in X}\left[g(y)+kd(x,y)\right]\leq g(x)+kd(x,x)=g(x)}$.

## Examples

• If ${\displaystyle k=0}$, then by definition ${\displaystyle g_{0}}$ is constant and ${\displaystyle g_{0}\equiv \inf \limits _{y\in X}g(y)}$.
• If ${\displaystyle g}$ is not proper, then ${\displaystyle g_{k}=+\infty }$ for all ${\displaystyle k\geq 0}$.

Take ${\displaystyle (X,d):=(\mathbb {R} ,|\cdot |)}$. If ${\displaystyle g}$ is finite-valued and differentiable, we can write down an expression for ${\displaystyle g_{k}}$. For a fixed ${\displaystyle x\in \mathbb {R} }$, the map ${\displaystyle g_{k,x}:y\mapsto g(y)+k|x-y|}$ is continuous everywhere and differentiable everywhere except for when ${\displaystyle y=x}$, where the derivative does not exist due to the absolute value. Thus we can apply standard optimization techniques from Calculus to solve for ${\displaystyle g_{k}(x)}$: find the critical points of ${\displaystyle g_{k,x}}$ and take the infimum of ${\displaystyle g_{k,x}}$ evaluated at the critical points. One of these values will always be the original function ${\displaystyle g}$ evaluated at ${\displaystyle x}$, since this corresponds to the critical point ${\displaystyle y=x}$ for ${\displaystyle g_{k,x}}$.

• Let ${\displaystyle g(x):=x^{2}}$. Then
${\displaystyle g_{k}(x)=\min \left\{x^{2},{\frac {k^{2}}{2}}+k\left|x\pm {\frac {k}{2}}\right|\right\}.}$
Plot of ${\displaystyle g(x)=x^{2}}$ and ${\displaystyle g_{k}(x)}$ for ${\displaystyle k=0,1,2,3}$.

## Approximating Lower Semicontinuous Functions by Lipschitz Functions

Proposition. [1][4] Let ${\displaystyle (X,d)}$ be a Polish space and let ${\displaystyle g:X\to (-\infty ,+\infty ]}$.

• If ${\displaystyle g}$ is proper and bounded below, so is ${\displaystyle g_{k}}$. Furthermore, ${\displaystyle g_{k}}$ is Lipschitz continuous for all ${\displaystyle k\geq 0}$.
• If, in addition, ${\displaystyle g}$ is lower semicontinuous, then ${\displaystyle g_{k}(x)\nearrow g(x)}$ for all ${\displaystyle x\in X}$.
• In this case, ${\displaystyle g_{k}\wedge k:=\min(g_{k},k)}$ is continuous and bounded and ${\displaystyle g_{k}(x)\wedge k\nearrow g(x)}$ for all ${\displaystyle x\in X}$.
Plot of ${\displaystyle g(x)=x^{2}}$ and ${\displaystyle g_{k}(x)\wedge k}$ for ${\displaystyle k=0,3,6,9}$.

Proof.

• Since ${\displaystyle g}$ is proper, there exists ${\displaystyle y_{0}\in X}$ such that ${\displaystyle g(y_{0})<+\infty }$. Then for any ${\displaystyle x\in X}$
${\displaystyle -\infty <\inf \limits _{y\in Y}g(y)\leq g_{k}(x)\leq g(y_{0})+kd(x,y_{0})<+\infty .}$

Thus ${\displaystyle g_{k}}$ is proper and bounded below. Next, for a fixed ${\displaystyle y\in X}$, let ${\displaystyle h_{k,y}(x):=g(y)+d(x,y)}$. Then as

${\displaystyle h_{k,y}(x_{1})-h_{k,y}(x_{2})=kd(x_{1},y)-kd(x_{2},y)\leq kd(x_{1},x_{2})}$ ,

the family ${\displaystyle \{h_{k,y}\}_{y\in X}}$ is uniformly Lipschitz and hence equicontinuous. Thus ${\displaystyle g_{k}=\inf \limits _{y\in Y}h_{k,y}}$ is Lipschitz continuous.

• Suppose that ${\displaystyle g}$ is also lower semicontinuous. Note that for all ${\displaystyle k_{1}\leq k_{2}}$, ${\displaystyle g_{k_{1}}(x)\leq g_{k_{2}}(x)\leq g(x)}$. Thus it suffices to show that ${\displaystyle \liminf \limits _{k\to \infty }g_{k}(x)\geq g(x)}$. This inequality is automatically satisfied when the left hand side is infinite, so without loss of generality assume that ${\displaystyle \liminf \limits _{k\to \infty }g_{k}(x)<+\infty }$. By definition of infimum, for each ${\displaystyle k\in \mathbb {N} }$ there exists ${\displaystyle y_{k}\in X}$ such that
${\displaystyle g(y_{k})+kd(x,y_{k})\leq g_{k}(x)+{\frac {1}{k}}}$.

Then

${\displaystyle +\infty >\liminf \limits _{k\to \infty }g_{k}(x)\geq \liminf \limits _{k\to \infty }\left[g(y_{k})+kd(x,y_{k})\right].}$

${\displaystyle g(y_{k})}$ is bounded below by assumption, while the only way ${\displaystyle kd(x,y_{k})}$ to be finite in the limit is for ${\displaystyle d(x,y_{k})}$ to vanish in the limit. Thus ${\displaystyle y_{k}}$ converges to ${\displaystyle x}$ in ${\displaystyle X}$, and by lower semicontinuity of ${\displaystyle g}$,

${\displaystyle \liminf \limits _{k\to \infty }g_{k}(x)\geq \liminf \limits _{k\to \infty }\left[g(y_{k})+kd(x,y_{k})\right]\geq g(x)}$.
• By definition, ${\displaystyle g_{k}\wedge k\in C_{b}(X)}$. Since ${\displaystyle g_{k}(x)\nearrow g(x)}$ for all ${\displaystyle x\in X}$, ${\displaystyle g_{k}(x)\wedge k\nearrow g(x)}$ for all ${\displaystyle x\in X}$.

## Portmanteau Theorem

Theorem (Portmanteau). [1] [4] Let ${\displaystyle (X,d)}$ be a Polish space, and let ${\displaystyle g:X\to (-\infty ,+\infty ]}$ be lower semicontinuous and bounded below. Then the functional ${\displaystyle \mu \mapsto \int _{X}g\,\mathrm {d} \mu }$ is lower semicontinuous with respect to narrow convergence in ${\displaystyle {\mathcal {P}}(X)}$, that is

${\displaystyle \mu _{n}\to \mu {\text{ narrowly}}\Longrightarrow \liminf \limits _{n\to \infty }\int _{X}g_{n}\,\mathrm {d} \mu \geq \int _{X}g\,\mathrm {d} \mu }$.


Proof. By the Moreau-Yosida approximation, for all ${\displaystyle k\geq 0}$,

${\displaystyle \liminf \limits _{n\to \infty }\int _{X}g\,\mathrm {d} \mu _{n}\geq \liminf \limits _{n\to \infty }\int _{X}g_{k}\wedge k\,\mathrm {d} \mu _{n}=\int _{X}g_{k}\wedge k\,\mathrm {d} \mu }$.

Taking ${\displaystyle k\to \infty }$, Fatou's Lemma ensures that

${\displaystyle \liminf \limits _{n\to \infty }\int _{X}g\,\mathrm {d} \mu _{n}\geq \liminf \limits _{k\to \infty }\int _{X}g_{k}\wedge k\,\mathrm {d} \mu \geq \int _{X}g\,\mathrm {d} \mu }$.

## Etymology of Portmanteau Theorem

The curious epithet attached to the above theorem is due to Billingsley [5], with a citation to a Jean-Pierre Portmanteau's Espoir pour l'ensemble vide? published in Annales de l'Université de Felletin in 1915. This is believed to be a fictional citation made as a play on words [6].

• The publication date is far too early; Kolmogorov's probability axioms were published in 1933. [7]
• Felletin is a small town in central France with no university, and there is no record of a Jean-Pierre Portmanteau aside from this citation.
• "Espoir pour l'ensemble vide" translates to "hope for the empty set" (translation was by Google, please confirm or amend if you speak French!)

## Generalizations

The Moreau-Yosida regularization is a specific case of a type of convolution, and many of the above results follow from this generalization. This material is adapted from Bauschke-Combettes Chapter 12 [2], where the setting is over a Hilbert space instead of a more general Polish space.

Let ${\displaystyle {\mathcal {H}}}$ be a Hilbert space, and let ${\displaystyle f,g:{\mathcal {H}}\to (-\infty ,+\infty ]}$. The infimal convolution or epi-sum ${\displaystyle f\,\square \,g:{\mathcal {H}}\to [-\infty ,+\infty ]}$ of ${\displaystyle f}$ and ${\displaystyle g}$ is

${\displaystyle (f\,\square \,g)(x):=\inf \limits _{y\in {\mathcal {H}}}\left[f(y)+g(x-y)\right]}$.


${\displaystyle f\,\square \,g}$ is said to be exact at a point ${\displaystyle x\in {\mathcal {H}}}$ if this infimum is attained. ${\displaystyle f\,\square \,g}$ is said to be exact if it is exact at every point of its domain, and in this case it is denoted by ${\displaystyle f\,{\dot {\square }}\,g}$.

Remark. Bauschke-Combettes uses a box with a dot in the middle for ${\displaystyle f\,\square \,g}$ to be exact. Due to technical difficulties, we will use ${\displaystyle f\,{\dot {\square }}\,g}$ instead.

For an example, let ${\displaystyle A,B\subseteq {\mathcal {H}}}$ be nonempty. Then ${\displaystyle \chi _{A}\,\square \,\chi _{B}}$ is exact, and ${\displaystyle \chi _{A}\,{\dot {\square }}\,\chi _{B}=\chi _{A+B}}$.

Proposition. Let ${\displaystyle g:{\mathcal {H}}\to (-\infty ,+\infty ]}$ be proper, ${\displaystyle p\in [1,+\infty )}$, and for ${\displaystyle k\in (0,+\infty )}$, let ${\displaystyle g_{k}:{\mathcal {H}}\to (-\infty ,+\infty ]}$ be given by

${\displaystyle g_{k}:=g\,\square \,\left({\frac {k}{p}}\|\cdot \|^{p}\right)}$.

Then the following hold for all ${\displaystyle k\in (0,+\infty )}$ and ${\displaystyle x\in {\mathcal {H}}}$:

• ${\displaystyle D(g_{k})={\mathcal {H}}}$,
• for ${\displaystyle 0, ${\displaystyle \inf \limits _{y\in {\mathcal {H}}}g(y)\leq g_{k_{1}}(x)\leq g_{k_{2}}(x)\leq g(x)}$,
• ${\displaystyle \inf \limits _{x\in {\mathcal {H}}}g_{k}(x)=\inf \limits _{x\in {\mathcal {H}}}g(x)}$,
• ${\displaystyle g_{k}(x)\searrow \inf \limits _{y\in {\mathcal {H}}}g(y)}$ as ${\displaystyle k\downarrow 0}$, and
• ${\displaystyle g_{k}}$ is bounded above on every ball in ${\displaystyle {\mathcal {H}}}$.

Remark. The convention given above differs slightly from Bauschke-Combettes to fit the convention in this article. The Moreau-Yosida regularization is the special case where ${\displaystyle p=1}$, and is called the Pasch-Hausdorff Envelope in Bauschke-Combettes.

Proposition. Let ${\displaystyle g:{\mathcal {H}}\to (-\infty ,+\infty ]}$ be lower semicontinuous and convex, let ${\displaystyle k\in (0,+\infty )}$, and let ${\displaystyle p\in (1,+\infty )}$. Then the infimal convolution ${\displaystyle g_{k}}$ is convex, proper, continuous, and exact. Moreover, for every ${\displaystyle x\in {\mathcal {H}}}$, the infimum

${\displaystyle g_{k}(x)=\inf \limits _{y\in {\mathcal {H}}}\left[g(y)+{\frac {k}{p}}\|x-y\|^{p}\right]}$

is uniquely attained.

## References

1. Craig, Katy C. Lower Semicontinuity in the Narrow Topology. Math 260J. Univ. of Ca. at Santa Barbara. Winter 2022.
2. Bauschke, Heinz H. and Patrick L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd Ed. Ch. 12. Springer, 2017.
3. Ambrosio, Luigi, Nicola Gigli, and Giuseppe Savaré. Gradient Flows in Metric Spaces and in the Space of Probability Measures. Ch. 3.1. Birkhäuser, 2005.
4. Santambrogio, Filippo. Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling Ch. 1.1. Birkhäuser, 2015.
5. Billingsley, Patrick. Convergence of Probability Measures, 2nd Ed. John Wiley & Sons, Inc. 1999.
6. Pagès, Gilles. Numerical Probability: An Introduction with Applications to Finance. Ch. 4.1. Springer, 2018.
7. Kolmogorov, Andrey (1950) [1933]. Foundations of the theory of probability. New York, USA: Chelsea Publishing Company.