From a96caaed079b6e368df090c3e3290398e8b4a99e Mon Sep 17 00:00:00 2001 From: Hicham Janati Date: Tue, 3 Sep 2019 18:15:38 +0200 Subject: update UOT paragraph in quickstart --- docs/source/quickstart.rst | 107 ++++++++++++++++++++++++++------------------- 1 file changed, 61 insertions(+), 46 deletions(-) diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst index 1640d6a..e4ab3a3 100644 --- a/docs/source/quickstart.rst +++ b/docs/source/quickstart.rst @@ -2,13 +2,13 @@ Quick start guide ================= -In the following we provide some pointers about which functions and classes +In the following we provide some pointers about which functions and classes to use for different problems related to optimal transport (OT) and machine learning. We refer when we can to concrete examples in the documentation that are also available as notebooks on the POT Github. This document is not a tutorial on numerical optimal transport. For this we strongly -recommend to read the very nice book [15]_ . +recommend to read the very nice book [15]_ . Optimal transport and Wasserstein distance @@ -55,8 +55,8 @@ solver is quite efficient and uses sparsity of the solution. Examples of use for :any:`ot.emd` are available in : - :any:`auto_examples/plot_OT_2D_samples` - - :any:`auto_examples/plot_OT_1D` - - :any:`auto_examples/plot_OT_L1_vs_L2` + - :any:`auto_examples/plot_OT_1D` + - :any:`auto_examples/plot_OT_L1_vs_L2` Computing Wasserstein distance @@ -102,13 +102,13 @@ distance. An example of use for :any:`ot.emd2` is available in : - :any:`auto_examples/plot_compute_emd` - + Special cases ^^^^^^^^^^^^^ Note that the OT problem and the corresponding Wasserstein distance can in some -special cases be computed very efficiently. +special cases be computed very efficiently. For instance when the samples are in 1D, then the OT problem can be solved in :math:`O(n\log(n))` by using a simple sorting. In this case we provide the @@ -117,13 +117,13 @@ matrix and value. Note that since the solution is very sparse the :code:`sparse` parameter of :any:`ot.emd_1d` allows for solving and returning the solution for very large problems. Note that in order to compute directly the :math:`W_p` Wasserstein distance in 1D we provide the function :any:`ot.wasserstein_1d` that -takes :code:`p` as a parameter. +takes :code:`p` as a parameter. Another special case for estimating OT and Monge mapping is between Gaussian distributions. In this case there exists a close form solution given in Remark 2.29 in [15]_ and the Monge mapping is an affine function and can be also computed from the covariances and means of the source and target -distributions. In the case when the finite sample dataset is supposed gaussian, we provide +distributions. In the case when the finite sample dataset is supposed gaussian, we provide :any:`ot.da.OT_mapping_linear` that returns the parameters for the Monge mapping. @@ -176,7 +176,7 @@ solution of the resulting optimization problem can be expressed as: where :math:`u` and :math:`v` are vectors and :math:`K=\exp(-M/\lambda)` where the :math:`\exp` is taken component-wise. In order to solve the optimization problem, on can use an alternative projection algorithm called Sinkhorn-Knopp that can be very -efficient for large values if regularization. +efficient for large values if regularization. The Sinkhorn-Knopp algorithm is implemented in :any:`ot.sinkhorn` and :any:`ot.sinkhorn2` that return respectively the OT matrix and the value of the @@ -201,12 +201,12 @@ More details about the algorithms used are given in the following note. + :code:`method='sinkhorn'` calls :any:`ot.bregman.sinkhorn_knopp` the classic algorithm [2]_. + :code:`method='sinkhorn_stabilized'` calls :any:`ot.bregman.sinkhorn_stabilized` the - log stabilized version of the algorithm [9]_. + log stabilized version of the algorithm [9]_. + :code:`method='sinkhorn_epsilon_scaling'` calls :any:`ot.bregman.sinkhorn_epsilon_scaling` the epsilon scaling version - of the algorithm [9]_. + of the algorithm [9]_. + :code:`method='greenkhorn'` calls :any:`ot.bregman.greenkhorn` the - greedy sinkhorn verison of the algorithm [22]_. + greedy sinkhorn verison of the algorithm [22]_. In addition to all those variants of sinkhorn, we have another implementation solving the problem in the smooth dual or semi-dual in @@ -236,7 +236,7 @@ of algorithms in [18]_ [19]_. Examples of use for :any:`ot.sinkhorn` are available in : - :any:`auto_examples/plot_OT_2D_samples` - - :any:`auto_examples/plot_OT_1D` + - :any:`auto_examples/plot_OT_1D` - :any:`auto_examples/plot_OT_1D_smooth` - :any:`auto_examples/plot_stochastic` @@ -248,13 +248,13 @@ While entropic OT is the most common and favored in practice, there exist other kind of regularization. We provide in POT two specific solvers for other regularization terms, namely quadratic regularization and group lasso regularization. But we also provide in :any:`ot.optim` two generic solvers that allows solving any -smooth regularization in practice. +smooth regularization in practice. Quadratic regularization """""""""""""""""""""""" The first general regularization term we can solve is the quadratic -regularization of the form +regularization of the form .. math:: \Omega(\gamma)=\sum_{i,j} \gamma_{i,j}^2 @@ -264,7 +264,7 @@ densifying the OT matrix but it keeps some sort of sparsity that is lost with entropic regularization as soon as :math:`\lambda>0` [17]_. This problem can be solved with POT using solvers from :any:`ot.smooth`, more specifically functions :any:`ot.smooth.smooth_ot_dual` or -:any:`ot.smooth.smooth_ot_semi_dual` with parameter :code:`reg_type='l2'` to +:any:`ot.smooth.smooth_ot_semi_dual` with parameter :code:`reg_type='l2'` to choose the quadratic regularization. .. hint:: @@ -300,7 +300,7 @@ gradient algorithm [7]_ in function .. hint:: Examples of group Lasso regularization are available in : - - :any:`auto_examples/plot_otda_classes` + - :any:`auto_examples/plot_otda_classes` - :any:`auto_examples/plot_otda_d2` @@ -311,7 +311,7 @@ Finally we propose in POT generic solvers that can be used to solve any regularization as long as you can provide a function computing the regularization and a function computing its gradient (or sub-gradient). -In order to solve +In order to solve .. math:: \gamma^* = arg\min_\gamma \quad \sum_{i,j}\gamma_{i,j}M_{i,j} + \lambda\Omega(\gamma) @@ -336,12 +336,12 @@ Another generic solver is proposed to solve the problem where :math:`\Omega_e` is the entropic regularization. In this case we use a generalized conditional gradient [7]_ implemented in :any:`ot.optim.gcg` that does not linearize the entropic term but -relies on :any:`ot.sinkhorn` for its iterations. +relies on :any:`ot.sinkhorn` for its iterations. .. hint:: An example of generic solvers are available in : - - :any:`auto_examples/plot_optim_OTreg` + - :any:`auto_examples/plot_optim_OTreg` Wasserstein Barycenters @@ -382,7 +382,7 @@ solver :any:`ot.lp.barycenter` that rely on generic LP solvers. By default the function uses :any:`scipy.optimize.linprog`, but more efficient LP solvers from cvxopt can be also used by changing parameter :code:`solver`. Note that this problem requires to solve a very large linear program and can be very slow in -practice. +practice. Similarly to the OT problem, OT barycenters can be computed in the regularized case. When using entropic regularization is used, the problem can be solved with a @@ -403,11 +403,11 @@ operators. We provide an implementation of this algorithm in function Examples of Wasserstein (:any:`ot.lp.barycenter`) and regularized Wasserstein barycenter (:any:`ot.bregman.barycenter`) computation are available in : - - :any:`auto_examples/plot_barycenter_1D` - - :any:`auto_examples/plot_barycenter_lp_vs_entropic` + - :any:`auto_examples/plot_barycenter_1D` + - :any:`auto_examples/plot_barycenter_lp_vs_entropic` An example of convolutional barycenter - (:any:`ot.bregman.convolutional_barycenter2d`) computation + (:any:`ot.bregman.convolutional_barycenter2d`) computation for 2D images is available in : @@ -451,13 +451,13 @@ optimal mapping is still an open problem in the general case but has been proven for smooth distributions by Brenier in his eponym `theorem `__. We provide in :any:`ot.da` several solvers for smooth Monge mapping estimation and domain -adaptation from discrete distributions. +adaptation from discrete distributions. Monge Mapping estimation ^^^^^^^^^^^^^^^^^^^^^^^^ We now discuss several approaches that are implemented in POT to estimate or -approximate a Monge mapping from finite distributions. +approximate a Monge mapping from finite distributions. First note that when the source and target distributions are supposed to be Gaussian distributions, there exists a close form solution for the mapping and its an @@ -513,16 +513,16 @@ A list of the provided implementation is given in the following note. Here is a list of the OT mapping classes inheriting from :any:`ot.da.BaseTransport` - + * :any:`ot.da.EMDTransport` : Barycentric mapping with EMD transport * :any:`ot.da.SinkhornTransport` : Barycentric mapping with Sinkhorn transport * :any:`ot.da.SinkhornL1l2Transport` : Barycentric mapping with Sinkhorn + group Lasso regularization [5]_ * :any:`ot.da.SinkhornLpl1Transport` : Barycentric mapping with Sinkhorn + - non convex group Lasso regularization [5]_ + non convex group Lasso regularization [5]_ * :any:`ot.da.LinearTransport` : Linear mapping estimation between Gaussians [14]_ - * :any:`ot.da.MappingTransport` : Nonlinear mapping estimation [8]_ + * :any:`ot.da.MappingTransport` : Nonlinear mapping estimation [8]_ .. hint:: @@ -550,7 +550,7 @@ consist in finding a linear projector optimizing the following criterion .. math:: P = \text{arg}\min_P \frac{\sum_i OT_e(\mu_i\#P,\mu_i\#P)}{\sum_{i,j\neq i} OT_e(\mu_i\#P,\mu_j\#P)} - + where :math:`\#` is the push-forward operator, :math:`OT_e` is the entropic OT loss and :math:`\mu_i` is the distribution of samples from class :math:`i`. :math:`P` is also constrained to @@ -575,10 +575,10 @@ respectively. Note that we also provide the Fisher discriminant estimator in Unbalanced optimal transport ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Unbalanced OT is a relaxation of the original OT problem where the violation of +Unbalanced OT is a relaxation of the entropy regularized OT problem where the violation of the constraint on the marginals is added to the objective of the optimization problem: - + .. math:: \min_\gamma \quad \sum_{i,j}\gamma_{i,j}M_{i,j} + reg\cdot\Omega(\gamma) + \alpha KL(\gamma 1, a) + \alpha KL(\gamma^T 1, b) @@ -589,9 +589,24 @@ where KL is the Kullback-Leibler divergence. This formulation allows for computing approximate mapping between distributions that do not have the same amount of mass. Interestingly the problem can be solved with a generalization of the Bregman projections algorithm [10]_. We provide a solver for unbalanced OT -in :any:`ot.unbalanced` and more specifically -in function :any:`ot.sinkhorn_unbalanced`. A solver for unbalanced OT barycenter -is available in :any:`ot.barycenter_unbalanced`. +in :any:`ot.unbalanced`. Computing the optimal transport +plan or the transport cost is similar to the balanced case. The Sinkhorn-Knopp +algorithm is implemented in :any:`ot.sinkhorn_unbalanced` and :any:`ot.sinkhorn_unbalanced2` +that return respectively the OT matrix and the value of the +linear term. Note that the regularization parameter :math:`\alpha` in the +equation above is given to those functions with the parameter :code:`reg_m`. + +Similarly, Unbalanced OT barycenters can be computed using :any:`ot.barycenter_unbalanced`. + +.. note:: + The main function to solve entropic regularized OT is :any:`ot.sinkhorn_unbalanced`. + This function is a wrapper and the parameter :code:`method` help you select + the actual algorithm used to solve the problem: + + + :code:`method='sinkhorn'` calls :any:`ot.unbalanced.sinkhorn_knopp_unbalanced` + the generalized Sinkhorn algorithm [10]_. + + :code:`method='sinkhorn_stabilized'` calls :any:`ot.unbalanced.sinkhorn_stabilized_unbalanced` + the log stabilized version of the algorithm [10]_. .. hint:: @@ -636,7 +651,7 @@ barycenters that can be expressed as where :math:`Ck` is the distance matrix between samples in distribution :math:`k`. Note that interestingly the barycenter is defined as a symmetric -positive matrix. We provide a block coordinate optimization procedure in +positive matrix. We provide a block coordinate optimization procedure in :any:`ot.gromov.gromov_barycenters` and :any:`ot.gromov.entropic_gromov_barycenters` for non-regularized and regularized barycenters respectively. @@ -654,19 +669,19 @@ The implementations of FGW and FGW barycenter is provided in functions Examples of computation of GW, regularized G and FGW are available in : - :any:`auto_examples/plot_gromov` - - :any:`auto_examples/plot_fgw` + - :any:`auto_examples/plot_fgw` Examples of GW, regularized GW and FGW barycenters are available in : - :any:`auto_examples/plot_gromov_barycenter` - - :any:`auto_examples/plot_barycenter_fgw` + - :any:`auto_examples/plot_barycenter_fgw` GPU acceleration ^^^^^^^^^^^^^^^^ We provide several implementation of our OT solvers in :any:`ot.gpu`. Those -implementations use the :code:`cupy` toolbox that obviously need to be installed. +implementations use the :code:`cupy` toolbox that obviously need to be installed. .. note:: @@ -701,7 +716,7 @@ FAQ 1. **How to solve a discrete optimal transport problem ?** The solver for discrete OT is the function :py:mod:`ot.emd` that returns - the OT transport matrix. If you want to solve a regularized OT you can + the OT transport matrix. If you want to solve a regularized OT you can use :py:mod:`ot.sinkhorn`. @@ -714,9 +729,9 @@ FAQ T=ot.emd(a,b,M) # exact linear program T_reg=ot.sinkhorn(a,b,M,reg) # entropic regularized OT - More detailed examples can be seen on this example: + More detailed examples can be seen on this example: :doc:`auto_examples/plot_OT_2D_samples` - + 2. **pip install POT fails with error : ImportError: No module named Cython.Build** @@ -726,7 +741,7 @@ FAQ installing POT. Note that this problem do not occur when using conda-forge since the packages - there are pre-compiled. + there are pre-compiled. See `Issue #59 `__ for more details. @@ -751,7 +766,7 @@ FAQ In order to limit import time and hard dependencies in POT. we do not import some sub-modules automatically with :code:`import ot`. In order to use the acceleration in :any:`ot.gpu` you need first to import is with - :code:`import ot.gpu`. + :code:`import ot.gpu`. See `Issue #85 `__ and :any:`ot.gpu` for more details. @@ -763,7 +778,7 @@ References .. [1] Bonneel, N., Van De Panne, M., Paris, S., & Heidrich, W. (2011, December). `Displacement nterpolation using Lagrangian mass transport `__. - In ACM Transactions on Graphics (TOG) (Vol. 30, No. 6, p. 158). ACM. + In ACM Transactions on Graphics (TOG) (Vol. 30, No. 6, p. 158). ACM. .. [2] Cuturi, M. (2013). `Sinkhorn distances: Lightspeed computation of optimal transport `__. In Advances @@ -874,4 +889,4 @@ References .. [24] Vayer, T., Chapel, L., Flamary, R., Tavenard, R. and Courty, N. (2019). `Optimal Transport for structured data with application on graphs `__ Proceedings - of the 36th International Conference on Machine Learning (ICML). \ No newline at end of file + of the 36th International Conference on Machine Learning (ICML). -- cgit v1.2.3