From 7dcfebbef19e1f94928fc71face612a2f71372b4 Mon Sep 17 00:00:00 2001 From: RĂ©mi Flamary Date: Fri, 28 Jun 2019 08:33:36 +0200 Subject: entropic mostly done, starting general regularization --- docs/source/quickstart.rst | 144 +++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 133 insertions(+), 11 deletions(-) diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst index c122d17..4f2d9bb 100644 --- a/docs/source/quickstart.rst +++ b/docs/source/quickstart.rst @@ -5,6 +5,9 @@ Quick start guide In the following we provide some pointers about which functions and classes to use for different problems related to optimal transport (OT). +This document is not a tutorial on numerical optimal transport. For this we strongly +recommend to read the very nice book [15]_ . + Optimal transport and Wasserstein distance ------------------------------------------ @@ -20,10 +23,11 @@ Solving optimal transport The optimal transport problem between discrete distributions is often expressed as - .. math:: - \gamma^* = arg\min_\gamma \quad \sum_{i,j}\gamma_{i,j}M_{i,j} - s.t. \gamma 1 = a; \gamma^T 1= b; \gamma\geq 0 +.. math:: + \gamma^* = arg\min_\gamma \quad \sum_{i,j}\gamma_{i,j}M_{i,j} + + s.t. \gamma 1 = a; \gamma^T 1= b; \gamma\geq 0 where : @@ -120,8 +124,6 @@ distributions. In this case when the finite sample dataset is supposed gaussian, mapping. - - Regularized Optimal Transport ----------------------------- @@ -146,6 +148,7 @@ We discuss in the following specific algorithms that can be used depending on the regularization term. + Entropic regularized OT ^^^^^^^^^^^^^^^^^^^^^^^ @@ -168,23 +171,107 @@ solution of the resulting optimization problem can be expressed as: \gamma_\lambda^*=\text{diag}(u)K\text{diag}(v) where :math:`u` and :math:`v` are vectors and :math:`K=\exp(-M/\lambda)` where -the :math:`\exp` is taken component-wise. +the :math:`\exp` is taken component-wise. In order to solve the optimization +problem, on can use an alternative projection algorithm that can be very +efficient for large values if regularization. + +The main function is POT are :any:`ot.sinkhorn` and +:any:`ot.sinkhorn2` that return respectively the OT matrix and the value of the +linear term. Note that the regularization parameter :math:`\lambda` in the +equation above is given to those function with the parameter :code:`reg`. + >>> import ot + >>> a=[.5,.5] + >>> b=[.5,.5] + >>> M=[[0.,1.],[1.,0.]] + >>> ot.sinkhorn(a,b,M,1) + array([[ 0.36552929, 0.13447071], + [ 0.13447071, 0.36552929]]) +More details about the algorithm used is given in the following note. + + +.. note:: + The main function to solve entropic regularized OT is :any:`ot.sinkhorn`. + This function is a wrapper and the parameter :code:`method` help you select + the actual algorithm used to solve the problem: + + + :code:`method='sinkhorn'` calls :any:`ot.bregman.sinkhorn_knopp` the + classic algorithm [2]_. + + :code:`method='sinkhorn_stabilized'` calls :any:`ot.bregman.sinkhorn_stabilized` the + log stabilized version of the algorithm [9]_. + + :code:`method='sinkhorn_epsilon_scaling'` calls + :any:`ot.bregman.sinkhorn_epsilon_scaling` the epsilon scaling version + of the algorithm [9]_. + + :code:`method='greenkhorn'` calls :any:`ot.bregman.greenkhorn` the + greedy sinkhorn verison of the algorithm [22]_. + + In addition to all those variants of sinkhorn, we have another + implementation solving the problem in the smooth dual or semi-dual in + :any:`ot.smooth`. This solver use the :any:`scipy.optimize.minimize` + function to solve the smooth problem with :code:`L-BFGS` algorithm. Tu use + this solver, use functions :any:`ot.smooth.smooth_ot_dual` or + :any:`ot.smooth.smooth_ot_semi_dual` with parameter :code:`reg_type='kl'` to + choose entropic/Kullbach Leibler regularization. + +.. hint:: + Examples of use for :any:`ot.sinkhorn` are available in the following examples: + + - :any:`auto_examples/plot_OT_2D_samples` + - :any:`auto_examples/plot_OT_1D` + - :any:`auto_examples/plot_OT_1D_smooth` + - :any:`auto_examples/plot_stochastic` + +Finally note that we also provide in :any:`ot.stochastic` several implementation +of stochastic solvers for entropic regularized OT [18]_ [19]_. Other regularization ^^^^^^^^^^^^^^^^^^^^ -Stochastic gradient descent -^^^^^^^^^^^^^^^^^^^^^^^^^^^ +While entropic OT is the most common and favored in practice, there exist other +kind of regularization. We provide in POT two specific solvers for other +regularization terms: namely quadratic regularization and group lasso +regularization. But we also provide in :any:`ot.optim` two generic solvers that allows solving any +smooth regularization in practice. + +The first general regularization term we can solve is the quadratic +regularization of the form + +.. math:: + \Omega(\gamma)=\sum_{i,j} \gamma_{i,j}^2 + +this regularization term has a similar effect to entropic regularization in +densifying the OT matrix but it keeps some sort of sparsity that is lost with +entropic regularization as soon as :math:`\lambda>0` [17]_. This problem cen be +solved with POT using solvers from :any:`ot.smooth`, more specifically +functions :any:`ot.smooth.smooth_ot_dual` or +:any:`ot.smooth.smooth_ot_semi_dual` with parameter :code:`reg_type='l2'` to +choose the quadratic regularization. + +Another regularization that has been used in recent years is the group lasso +regularization + +.. math:: + \Omega(\gamma)=\sum_{j,G\in\mathcal{G}} \|\gamma_{G,j}\|_p^q + +where :math:`\mathcal{G}` contains non overlapping groups of lines in the OT +matrix. This regularization proposed in [5]_ will promote sparsity at the group level and for +instance will force target samples to get mass from a small number of groups. +Note that the exact OT solution is already sparse so this regularization does +not make sens if it is not combined with others such as entropic. + + + + + Wasserstein Barycenters ----------------------- Monge mapping and Domain adaptation with Optimal transport ----------------------------------------- +---------------------------------------------------------- Other applications @@ -207,7 +294,6 @@ FAQ the OT transport matrix. If you want to solve a regularized OT you can use :py:mod:`ot.sinkhorn`. - Here is a simple use case: @@ -222,7 +308,43 @@ FAQ :doc:`auto_examples/plot_OT_2D_samples` -2. **Compute a Wasserstein distance** +2. **pip install POT fails with error : ImportError: No module named Cython.Build** + + As discussed shortly in the README file. POT requires to have :code:`numpy` + and :code:`cython` installed to build. This corner case is not yet handled + by :code:`pip` and for now you need to install both library prior to + installing POT. + + Note that this problem do not occur when using conda-forge since the packages + there are pre-compiled. + + See `Issue #59 `__ for more + details. + +3. **Why is Sinkhorn slower than EMD ?** + + This might come from the choice of the regularization term. The speed of + convergence of sinkhorn depends directly on this term [22]_ and when the + regularization gets very small the problem try and approximate the exact OT + which leads to slow convergence in addition to numerical problems. In other + words, for large regularization sinkhorn will be very fast to converge, for + small regularization (when you need an OT matrix close to the true OT), it + might be quicker to use the EMD solver. + + Also note that the numpy implementation of the sinkhorn can use parallel + computation depending on the configuration of your system but very important + speedup can be obtained by using a GPU implementation since all operations + are matrix/vector products. + +4. **Using GPU fails with error: module 'ot' has no attribute 'gpu'** + + In order to limit import time and hard dependencies in POT. we do not import + some sub-modules automatically with :code:`import ot`. In order to use the + acceleration in :any:`ot.gpu` you need first to import is with + :code:`import ot.gpu`. + + See `Issue #85 `__ and :any:`ot.gpu` + for more details. References -- cgit v1.2.3