[DOC] Corrected spelling errors (#467)

* Fix typos in docstrings and examples * A few more fixes * Fix ref for `center_ot_dual` function * Another typo * Fix titles formatting * Explicit empty line after math blocks * Typo: asymmetric * Fix code cell formatting for 1D barycenters * Empirical * Fix indentation for references * Fixed all WARNINGs about title formatting * Fix empty lines after math blocks * Fix whitespace line * Update changelog * Consistent Gromov-Wasserstein * More Gromov-Wasserstein consistency --------- Co-authored-by: Rémi Flamary <remi.flamary@gmail.com>
author: Oleksii Kachaiev <kachayev@gmail.com> 2023-05-03 10:36:09 +0200
committer: GitHub <noreply@github.com> 2023-05-03 10:36:09 +0200
commit: 2aeb591be6b19a93f187516495ed15f1a47be925 (patch)
tree: 9a6f759856a3f6b2d7c6db3514927ba3e5af10b5
parent: 8a7035bdaa5bb164d1c16febbd83650d1fb6d393 (diff)
56 files changed, 232 insertions, 209 deletions
diff --git a/README.md b/README.md
index f0fb4bd..c16b328 100644
--- a/README.md
+++ b/README.md
@@ -212,7 +212,7 @@ You can also post bug reports and feature requests in Github issues. Make sure t
 
 [3] Benamou, J. D., Carlier, G., Cuturi, M., Nenna, L., & Peyré, G. (2015). [Iterative Bregman projections for regularized transportation problems](https://arxiv.org/pdf/1412.5154.pdf). SIAM Journal on Scientific Computing, 37(2), A1111-A1138.
 
-[4] S. Nakhostin, N. Courty, R. Flamary, D. Tuia, T. Corpetti, [Supervised planetary unmixing with optimal transport](https://hal.archives-ouvertes.fr/hal-01377236/document), Whorkshop on Hyperspectral Image and Signal Processing : Evolution in Remote Sensing (WHISPERS), 2016.
+[4] S. Nakhostin, N. Courty, R. Flamary, D. Tuia, T. Corpetti, [Supervised planetary unmixing with optimal transport](https://hal.archives-ouvertes.fr/hal-01377236/document), Workshop on Hyperspectral Image and Signal Processing : Evolution in Remote Sensing (WHISPERS), 2016.
 
 [5] N. Courty; R. Flamary; D. Tuia; A. Rakotomamonjy, [Optimal Transport for Domain Adaptation](https://arxiv.org/pdf/1507.00504.pdf), in IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.PP, no.99, pp.1-1
 
@@ -250,7 +250,7 @@ You can also post bug reports and feature requests in Github issues. Make sure t
 
 [22] J. Altschuler, J.Weed, P. Rigollet, (2017) [Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration](https://papers.nips.cc/paper/6792-near-linear-time-approximation-algorithms-for-optimal-transport-via-sinkhorn-iteration.pdf), Advances in Neural Information Processing Systems (NIPS) 31
 
-[23] Aude, G., Peyré, G., Cuturi, M., [Learning Generative Models with Sinkhorn Divergences](https://arxiv.org/abs/1706.00292), Proceedings of the Twenty-First International Conference on Artficial Intelligence and Statistics, (AISTATS) 21, 2018
+[23] Aude, G., Peyré, G., Cuturi, M., [Learning Generative Models with Sinkhorn Divergences](https://arxiv.org/abs/1706.00292), Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, (AISTATS) 21, 2018
 
 [24] Vayer, T., Chapel, L., Flamary, R., Tavenard, R. and Courty, N. (2019). [Optimal Transport for structured data with application on graphs](http://proceedings.mlr.press/v97/titouan19a.html) Proceedings of the 36th International Conference on Machine Learning (ICML).
 
diff --git a/RELEASES.md b/RELEASES.md
index b18fdc3..3366e2a 100644
--- a/RELEASES.md
+++ b/RELEASES.md
@@ -9,6 +9,8 @@
 
 - Fix circleci-redirector action and codecov (PR #460)
 - Fix issues with cuda for ot.binary_search_circle and with gradients for ot.sliced_wasserstein_sphere (PR #457)
+- Major documentation cleanup (PR #462, #467)
+- Fix gradients for "Wasserstein2 Minibatch GAN" example (PR #466)
 
 ## 0.9.0
 
@@ -87,7 +89,7 @@ big. More details below.
     
 
 #### New features
-- Added feature to (Fused) Gromov-Wasserstein solvers herited from `ot.optim` to support relative and absolute loss variations as stopping criterions (PR #431)
+- Added feature to (Fused) Gromov-Wasserstein solvers inherited from `ot.optim` to support relative and absolute loss variations as stopping criterions (PR #431)
 - Added feature to (Fused) Gromov-Wasserstein solvers to handle asymmetric matrices (PR #431)
 - Added semi-relaxed (Fused) Gromov-Wasserstein solvers in `ot.gromov` + examples (PR #431)
 - Added the spherical sliced-Wasserstein discrepancy in `ot.sliced.sliced_wasserstein_sphere` and `ot.sliced.sliced_wasserstein_sphere_unif` + examples (PR #434)
@@ -279,7 +281,7 @@ a [Generative Network
 (GAN)](https://PythonOT.github.io/auto_examples/backends/plot_wass2_gan_torch.html),
 for a  [sliced Wasserstein gradient
 flow](https://PythonOT.github.io/auto_examples/backends/plot_sliced_wass_grad_flow_pytorch.html)
-and [optimizing the Gromov-Wassersein distance](https://PythonOT.github.io/auto_examples/backends/plot_optim_gromov_pytorch.html). Note that the Jax backend is still in early development and quite
+and [optimizing the Gromov-Wasserstein distance](https://PythonOT.github.io/auto_examples/backends/plot_optim_gromov_pytorch.html). Note that the Jax backend is still in early development and quite
 slow at the moment, we strongly recommend for Jax users to use the [OTT
 toolbox](https://github.com/google-research/ott)  when possible.
  As a result of this new feature,
@@ -291,7 +293,7 @@ Pointwise Gromov
 Wasserstein](https://PythonOT.github.io/auto_examples/gromov/plot_gromov.html#compute-gw-with-a-scalable-stochastic-method-with-any-loss-function),
 Sinkhorn in log space with `method='sinkhorn_log'`, [Projection Robust
 Wasserstein](https://PythonOT.github.io/gen_modules/ot.dr.html?highlight=robust#ot.dr.projection_robust_wasserstein),
-ans [deviased Sinkorn barycenters](https://PythonOT.github.ioauto_examples/barycenters/plot_debiased_barycenter.html).
+ans [debiased Sinkhorn barycenters](https://PythonOT.github.ioauto_examples/barycenters/plot_debiased_barycenter.html).
 
 This release will also simplify the installation process. We have now a
 `pyproject.toml` that defines the build dependency and POT should now build even
@@ -432,7 +434,7 @@ are coming for the next versions.
 
 #### Closed issues
 
-- Add JMLR paper to the readme and Mathieu Blondel to the Acknoledgments (PR
+- Add JMLR paper to the readme and Mathieu Blondel to the Acknowledgments (PR
   #231, #232)
 - Bug in Unbalanced OT example (Issue #127)
 - Clean Cython output when calling setup.py clean (Issue #122)
@@ -440,7 +442,7 @@ are coming for the next versions.
 - EMD dimension mismatch (Issue #114, Fixed in PR #116)
 - 2D barycenter bug for non square images (Issue #124, fixed in PR #132)
 - Bad value in EMD 1D (Issue #138, fixed in PR #139)
-- Log bugs for Gromov-Wassertein solver (Issue #107, fixed in PR #108)
+- Log bugs for Gromov-Wasserstein solver (Issue #107, fixed in PR #108)
 - Weight issues in barycenter function (PR #106)
 
 ## 0.6.0
@@ -471,9 +473,9 @@ a solver for [Unbalanced OT
 barycenters](https://github.com/rflamary/POT/blob/master/notebooks/plot_UOT_barycenter_1D.ipynb).
 A new variant of Gromov-Wasserstein divergence called [Fused
 Gromov-Wasserstein](https://pot.readthedocs.io/en/latest/all.html?highlight=fused_#ot.gromov.fused_gromov_wasserstein)
-has been also contributed with exemples of use on [structured
+has been also contributed with examples of use on [structured
 data](https://github.com/rflamary/POT/blob/master/notebooks/plot_fgw.ipynb) and
-computing [barycenters of labeld
+computing [barycenters of labeled
 graphs](https://github.com/rflamary/POT/blob/master/notebooks/plot_barycenter_fgw.ipynb).
 
 
@@ -534,7 +536,7 @@ and [free support](https://github.com/rflamary/POT/blob/master/notebooks/plot_fr
 implementation of entropic OT.
 
 POT 0.5 also comes with a rewriting of ot.gpu using the cupy framework instead of
-the unmaintained cudamat. Note that while we tried to keed changes to the
+the unmaintained cudamat. Note that while we tried to keep changes to the
 minimum, the OTDA classes were deprecated. If you are happy with the cudamat
 implementation, we recommend you stay with stable release 0.4 for now.
 
@@ -558,7 +560,7 @@ and new POT contributors (you can see the list in the [readme](https://github.co
 * Stochastic OT in the dual and semi-dual (PR #52 and PR #62)
 * Free support barycenters (PR #56)
 * Speed-up Sinkhorn function (PR #57 and PR #58)
-* Add convolutional Wassersein barycenters for 2D images (PR #64)
+* Add convolutional Wasserstein barycenters for 2D images (PR #64)
 * Add Greedy Sinkhorn variant (Greenkhorn) (PR #66)
 * Big ot.gpu update with cupy implementation (instead of un-maintained cudamat) (PR #67)
 
@@ -609,7 +611,7 @@ This release contains a lot of contribution from new contributors.
 * new notebooks for emd computation and Wasserstein Discriminant Analysis
 * relocate notebooks
 * update documentation
-* clean_zeros(a,b,M) for removimg zeros in sparse distributions
+* clean_zeros(a,b,M) for removing zeros in sparse distributions
 * GPU implementations for sinkhorn and group lasso regularization
 
 
@@ -617,7 +619,7 @@ This release contains a lot of contribution from new contributors.
 *7 Apr 2017*
 
 * New dimensionality reduction method (WDA)
-* Efficient method emd2 returns only tarnsport (in paralell if several histograms given)
+* Efficient method emd2 returns only transport (in parallel if several histograms given)
 
 
 
diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
index 1dc9f71..cd41a95 100644
--- a/docs/source/quickstart.rst
+++ b/docs/source/quickstart.rst
@@ -151,7 +151,7 @@ case you are only solving an approximation of the Wasserstein distance because
 the 1-Lipschitz constraint on the dual cannot be enforced exactly (approximated
 through filter thresholding or regularization). Finally note that in order to
 avoid solving large scale OT problems, a number of recent approached minimized
-the expected Wasserstein distance on minibtaches that is different from the
+the expected Wasserstein distance on minibatches that is different from the
 Wasserstein but has better computational and
 `statistical properties <https://arxiv.org/pdf/1910.04091.pdf>`_.
 
@@ -164,8 +164,8 @@ Optimal transport and Wasserstein distance
     In POT, most functions that solve OT or regularized OT problems have two
     versions that return the OT matrix or the value of the optimal solution. For
     instance :any:`ot.emd` returns the OT matrix and :any:`ot.emd2` returns the
-    Wassertsein distance. This approach has been implemented in practice for all
-    solvers that return an OT matrix (even Gromov-Wasserstsein).
+    Wasserstein distance. This approach has been implemented in practice for all
+    solvers that return an OT matrix (even Gromov-Wasserstein).
 
 .. _kantorovitch_solve:
 
@@ -349,9 +349,9 @@ More details about the algorithms used are given in the following note.
       classic algorithm [2]_.
     + :code:`method='sinkhorn_log'` calls :any:`ot.bregman.sinkhorn_log`  the
       sinkhorn algorithm in log space [2]_ that is more stable but can be
-      slower in numpy since `logsumexp` is not implmemented in parallel. 
+      slower in numpy since `logsumexp` is not implemented in parallel.
       It is the recommended solver for applications that requires
-      differentiability with a  small number of iterations.
+      differentiability with a small number of iterations.
     + :code:`method='sinkhorn_stabilized'` calls :any:`ot.bregman.sinkhorn_stabilized`  the
       log stabilized version of the algorithm [9]_.
     + :code:`method='sinkhorn_epsilon_scaling'` calls
@@ -368,7 +368,7 @@ More details about the algorithms used are given in the following note.
     function to solve the smooth problem with :code:`L-BFGS-B` algorithm. Tu use
     this solver, use functions :any:`ot.smooth.smooth_ot_dual` or
     :any:`ot.smooth.smooth_ot_semi_dual` with parameter :code:`reg_type='kl'` to
-    choose entropic/Kullbach Leibler regularization.
+    choose entropic/Kullbach-Leibler regularization.
 
     **Choosing a Sinkhorn solver**
 
@@ -378,7 +378,7 @@ More details about the algorithms used are given in the following note.
     :any:`ot.bregman.sinkhorn_stabilized` solver that will avoid numerical
     errors. This last solver can be very slow in practice and might not even
     converge to a reasonable OT matrix in a finite time. This is why
-    :any:`ot.bregman.sinkhorn_epsilon_scaling` that relie on iterating the value
+    :any:`ot.bregman.sinkhorn_epsilon_scaling` that relies on iterating the value
     of the regularization (and using warm start) sometimes leads to better
     solutions. Note that the greedy version of the Sinkhorn
     :any:`ot.bregman.greenkhorn` can also lead to a speedup and the screening
@@ -546,7 +546,7 @@ where :math:`b_k` are also weights in the simplex. In the non-regularized case,
 the problem above is a classical linear program. In this case we propose a
 solver :meth:`ot.lp.barycenter` that relies on generic LP solvers. By default the
 function uses :any:`scipy.optimize.linprog`, but more efficient LP solvers from
-cvxopt can be also used by changing parameter :code:`solver`. Note that this problem
+`cvxopt` can be also used by changing parameter :code:`solver`. Note that this problem
 requires to solve a very large linear program and can be very slow in
 practice.
 
@@ -812,7 +812,7 @@ Gromov Wasserstein(GW)
 Gromov Wasserstein (GW) is a generalization of OT to distributions that do not lie in
 the same space [13]_. In this case one cannot compute distance between samples
 from the two distributions. [13]_ proposed instead to realign the metric spaces
-by computing a transport between distance matrices. The Gromow Wasserstein
+by computing a transport between distance matrices. The Gromov Wasserstein
 alignment between two distributions can be expressed as the one minimizing:
 
 .. math::
@@ -837,7 +837,7 @@ There also exists an entropic regularized variant of GW that has been proposed i
     :heading-level: "
 
 Gromov Wasserstein barycenters
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Note that similarly to Wasserstein distance GW allows for the definition of GW
 barycenters that can be expressed as
@@ -1134,7 +1134,7 @@ References
 
 .. [23] Genevay, A., Peyré, G., Cuturi, M., `Learning Generative Models with
     Sinkhorn Divergences <https://arxiv.org/abs/1706.00292>`__, Proceedings
-    of the Twenty-First International Conference on Artficial Intelligence
+    of the Twenty-First International Conference on Artificial Intelligence
     and Statistics, (AISTATS) 21, 2018
 
 .. [24] Vayer, T., Chapel, L., Flamary, R., Tavenard, R. and Courty, N.
@@ -1187,18 +1187,18 @@ References
     In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10648-10656).
 
 .. [36] Liutkus, A., Simsekli, U., Majewski, S., Durmus, A., & Stöter, F. R. 
-       (2019, May). `Sliced-Wasserstein flows: Nonparametric generative modeling via
-        optimal transport and diffusions
-        <http://proceedings.mlr.press/v97/liutkus19a/liutkus19a.pdf>`_. In International
-        Conference on Machine Learning (pp. 4104-4113). PMLR.
+    (2019, May). `Sliced-Wasserstein flows: Nonparametric generative modeling via
+    optimal transport and diffusions
+    <http://proceedings.mlr.press/v97/liutkus19a/liutkus19a.pdf>`_. In International
+    Conference on Machine Learning (pp. 4104-4113). PMLR.
 
 .. [37] Janati, H., Cuturi, M., Gramfort, A. `Debiased sinkhorn barycenters 
     <http://proceedings.mlr.press/v119/janati20a/janati20a.pdf>`_ Proceedings of
     the 37th International Conference on Machine Learning, PMLR 119:4692-4701, 2020
 
 .. [38] C. Vincent-Cuaz, T. Vayer, R. Flamary, M. Corneli, N. Courty, `Online
-       Graph Dictionary Learning <https://arxiv.org/pdf/2102.06555.pdf>`_\ , 
-       International Conference on Machine Learning (ICML), 2021.
+    Graph Dictionary Learning <https://arxiv.org/pdf/2102.06555.pdf>`_\ , 
+    International Conference on Machine Learning (ICML), 2021.
 
 .. [39] Gozlan, N., Roberto, C., Samson, P. M., & Tetali, P. (2017).
     `Kantorovich duality for general transport costs and applications
diff --git a/examples/backends/plot_dual_ot_pytorch.py b/examples/backends/plot_dual_ot_pytorch.py
index d3f7a66..67c7077 100644
--- a/examples/backends/plot_dual_ot_pytorch.py
+++ b/examples/backends/plot_dual_ot_pytorch.py
@@ -100,7 +100,7 @@ pl.xlabel("Iterations")
 Ge = ot.stochastic.plan_dual_entropic(u, v, xs, xt, reg=reg)
 
 # %%
-# Plot teh estimated entropic OT plan
+# Plot the estimated entropic OT plan
 # -----------------------------------
 
 pl.figure(3, (10, 5))
@@ -114,7 +114,7 @@ pl.title('Source and target distributions')
 
 # %%
 # Estimating dual variables for quadratic OT
-# -----------------------------------------
+# ------------------------------------------
 
 u = torch.randn(n_source_samples, requires_grad=True)
 v = torch.randn(n_source_samples, requires_grad=True)
@@ -157,7 +157,7 @@ Gq = ot.stochastic.plan_dual_quadratic(u, v, xs, xt, reg=reg)
 
 # %%
 # Plot the estimated quadratic OT plan
-# -----------------------------------
+# ------------------------------------
 
 pl.figure(5, (10, 5))
 pl.clf()
diff --git a/examples/backends/plot_optim_gromov_pytorch.py b/examples/backends/plot_optim_gromov_pytorch.py
index cdc1587..0ae2890 100644
--- a/examples/backends/plot_optim_gromov_pytorch.py
+++ b/examples/backends/plot_optim_gromov_pytorch.py
@@ -1,7 +1,7 @@
 r"""
-=================================
+=======================================================
 Optimizing the Gromov-Wasserstein distance with PyTorch
-=================================
+=======================================================
 
 In this example, we use the pytorch backend to optimize the Gromov-Wasserstein
 (GW) loss between two graphs expressed as empirical distribution.
@@ -11,7 +11,7 @@ graph so that it minimizes the GW with a given Stochastic Block Model graph.
 We can see that this actually recovers the proportion of classes in the SBM
 and allows for an accurate clustering of the nodes using the GW optimal plan.
 
-In the second part, we optimize simultaneously the weights and the sructure of
+In the second part, we optimize simultaneously the weights and the structure of
 the template graph which allows us to perform graph compression and to recover
 other properties of the SBM.
 
@@ -38,7 +38,7 @@ from ot.gromov import gromov_wasserstein2
 
 # %%
 # Graph generation
-# ---------------
+# ----------------
 
 rng = np.random.RandomState(42)
 
@@ -95,8 +95,8 @@ pl.axis("off")
 
 # %%
 # Optimizing GW w.r.t. the weights on a template structure
-# ------------------------------------------------
-# The adajacency matrix C1 is block diagonal with 3 blocks. We want to
+# --------------------------------------------------------
+# The adjacency matrix C1 is block diagonal with 3 blocks. We want to
 # optimize the weights of a simple template C0=eye(3) and see if we can
 # recover the proportion of classes from the SBM (up to a permutation).
 
@@ -155,7 +155,7 @@ print("True proportions : ", ratio)
 
 # %%
 # Community clustering with uniform and estimated weights
-# --------------------------------------------
+# -------------------------------------------------------
 # The GW OT  plan can be used to perform a clustering of the nodes of a graph
 # when computing the GW with a simple template like C0 by labeling nodes in
 # the original graph using by the index of the noe in the template receiving
@@ -193,7 +193,7 @@ pl.axis("off")
 # classes
 
 
-def graph_compession_gw(nb_nodes, C2, a2, nb_iter_max=100, lr=1e-2):
+def graph_compression_gw(nb_nodes, C2, a2, nb_iter_max=100, lr=1e-2):
     """ solve min_a GW(C1,C2,a, a2) by gradient descent"""
 
     # use pyTorch for our data
@@ -237,8 +237,8 @@ def graph_compession_gw(nb_nodes, C2, a2, nb_iter_max=100, lr=1e-2):
 
 
 nb_nodes = 3
-a0_est2, C0_est2, loss_iter2 = graph_compession_gw(nb_nodes, C1, ot.unif(n),
-                                                   nb_iter_max=100, lr=5e-2)
+a0_est2, C0_est2, loss_iter2 = graph_compression_gw(nb_nodes, C1, ot.unif(n),
+                                                    nb_iter_max=100, lr=5e-2)
 
 pl.figure(4)
 pl.plot(loss_iter2)
diff --git a/examples/backends/plot_sliced_wass_grad_flow_pytorch.py b/examples/backends/plot_sliced_wass_grad_flow_pytorch.py
index f00de50..07a4926 100644
--- a/examples/backends/plot_sliced_wass_grad_flow_pytorch.py
+++ b/examples/backends/plot_sliced_wass_grad_flow_pytorch.py
@@ -1,16 +1,16 @@
 r"""
-=================================
+============================================================
 Sliced Wasserstein barycenter and gradient flow with PyTorch
-=================================
+============================================================
 
-In this exemple we use the pytorch backend to optimize the sliced Wasserstein
+In this example we use the pytorch backend to optimize the sliced Wasserstein
 loss between two empirical distributions [31].
 
 In the first example one we perform a
 gradient flow on the support of a distribution that minimize the sliced
-Wassersein distance as poposed in [36].
+Wasserstein distance as proposed in [36].
 
-In the second exemple we optimize with a gradient descent the sliced
+In the second example we optimize with a gradient descent the sliced
 Wasserstein barycenter between two distributions as in [31].
 
 [31] Bonneel, Nicolas, et al. "Sliced and radon wasserstein barycenters of
diff --git a/examples/backends/plot_ssw_unif_torch.py b/examples/backends/plot_ssw_unif_torch.py
index 7ccc2af..afe3fa6 100644
--- a/examples/backends/plot_ssw_unif_torch.py
+++ b/examples/backends/plot_ssw_unif_torch.py
@@ -119,7 +119,7 @@ for i in range(9):
 
 # %%
 # Animate trajectories of generated samples along iteration
-# -------------------------------------------------------
+# ---------------------------------------------------------
 
 pl.figure(4, (8, 8))
 
diff --git a/examples/backends/plot_stoch_continuous_ot_pytorch.py b/examples/backends/plot_stoch_continuous_ot_pytorch.py
index 6d9b916..714a5d3 100644
--- a/examples/backends/plot_stoch_continuous_ot_pytorch.py
+++ b/examples/backends/plot_stoch_continuous_ot_pytorch.py
@@ -125,8 +125,8 @@ pl.xlabel("Iterations")
 
 
 # %%
-# Plot the density on arget for a given source sample
-# ---------------------------------------------------
+# Plot the density on target for a given source sample
+# ----------------------------------------------------
 
 
 nv = 100
@@ -155,7 +155,7 @@ Gg = Gg.reshape((nv, nv)).detach().numpy()
 pl.scatter(Xs[:nvisu, 0], Xs[:nvisu, 1], marker='+', zorder=2, alpha=0.05)
 pl.scatter(Xt[:nvisu, 0], Xt[:nvisu, 1], marker='o', zorder=2, alpha=0.05)
 pl.scatter(Xs[iv:iv + 1, 0], Xs[iv:iv + 1, 1], s=100, marker='+', label='Source sample', zorder=2, alpha=1, color='C0')
-pl.pcolormesh(XX, YY, Gg, cmap='Greens', label='Density of transported sourec sample')
+pl.pcolormesh(XX, YY, Gg, cmap='Greens', label='Density of transported source sample')
 pl.legend(loc=0)
 ax_bounds = pl.axis()
 pl.title('Density of transported source sample')
@@ -169,7 +169,7 @@ Gg = Gg.reshape((nv, nv)).detach().numpy()
 pl.scatter(Xs[:nvisu, 0], Xs[:nvisu, 1], marker='+', zorder=2, alpha=0.05)
 pl.scatter(Xt[:nvisu, 0], Xt[:nvisu, 1], marker='o', zorder=2, alpha=0.05)
 pl.scatter(Xs[iv:iv + 1, 0], Xs[iv:iv + 1, 1], s=100, marker='+', label='Source sample', zorder=2, alpha=1, color='C0')
-pl.pcolormesh(XX, YY, Gg, cmap='Greens', label='Density of transported sourec sample')
+pl.pcolormesh(XX, YY, Gg, cmap='Greens', label='Density of transported source sample')
 pl.legend(loc=0)
 ax_bounds = pl.axis()
 pl.title('Density of transported source sample')
@@ -183,7 +183,7 @@ Gg = Gg.reshape((nv, nv)).detach().numpy()
 pl.scatter(Xs[:nvisu, 0], Xs[:nvisu, 1], marker='+', zorder=2, alpha=0.05)
 pl.scatter(Xt[:nvisu, 0], Xt[:nvisu, 1], marker='o', zorder=2, alpha=0.05)
 pl.scatter(Xs[iv:iv + 1, 0], Xs[iv:iv + 1, 1], s=100, marker='+', label='Source sample', zorder=2, alpha=1, color='C0')
-pl.pcolormesh(XX, YY, Gg, cmap='Greens', label='Density of transported sourec sample')
+pl.pcolormesh(XX, YY, Gg, cmap='Greens', label='Density of transported source sample')
 pl.legend(loc=0)
 ax_bounds = pl.axis()
 pl.title('Density of transported source sample')
diff --git a/examples/backends/plot_unmix_optim_torch.py b/examples/backends/plot_unmix_optim_torch.py
index 9ae66e9..e47a5e0 100644
--- a/examples/backends/plot_unmix_optim_torch.py
+++ b/examples/backends/plot_unmix_optim_torch.py
@@ -135,7 +135,7 @@ for i in range(niter):
 
 ##############################################################################
 # Estimated weights and convergence of the objective
-# ---------------------------------------------------
+# --------------------------------------------------
 
 we = w.detach().numpy()
 print('Estimated mixture:', we)
@@ -147,8 +147,8 @@ pl.title('Wasserstein distance')
 pl.xlabel("Iterations")
 
 ##############################################################################
-# Ploting the reweighted source distribution
-# ------------------------------------------
+# Plotting the reweighted source distribution
+# -------------------------------------------
 
 pl.figure(3)
 
diff --git a/examples/backends/plot_wass1d_torch.py b/examples/backends/plot_wass1d_torch.py
index cd8e2fd..5a85795 100644
--- a/examples/backends/plot_wass1d_torch.py
+++ b/examples/backends/plot_wass1d_torch.py
@@ -94,7 +94,7 @@ pl.show()
 
 # %%
 # Wasserstein barycenter
-# ---------
+# ----------------------
 # In this example, we consider the following Wasserstein barycenter problem
 # $$ \\eta^* = \\min_\\eta\;\;\; (1-t)W(\\mu,\\eta) + tW(\\eta,\\nu)$$
 # where :math:`\\mu` and :math:`\\nu` are reference 1D measures, and :math:`t`
diff --git a/examples/backends/plot_wass2_gan_torch.py b/examples/backends/plot_wass2_gan_torch.py
index cc82f4f..f39d186 100644
--- a/examples/backends/plot_wass2_gan_torch.py
+++ b/examples/backends/plot_wass2_gan_torch.py
@@ -19,13 +19,13 @@ optimization problem:
 
 
 In practice we do not have access to the full distribution :math:`\mu_d` but
-samples and we cannot compute the Wasserstein distance for lare dataset.
+samples and we cannot compute the Wasserstein distance for large dataset.
 [Arjovsky2017] proposed to approximate the dual potential of Wasserstein 1
 with a neural network recovering an optimization problem similar to GAN.
 In this example
 we will optimize the expectation of the Wasserstein distance over minibatches
 at each iterations as proposed in [Genevay2018]. Optimizing the Minibatches
-of the Wasserstein distance  has been studied in[Fatras2019].
+of the Wasserstein distance  has been studied in [Fatras2019].
 
 [Arjovsky2017] Arjovsky, M., Chintala, S., & Bottou, L. (2017, July).
 Wasserstein generative adversarial networks. In International conference
@@ -183,7 +183,7 @@ for i in range(9):
 
 # %%
 # Animate trajectories of generated samples along iteration
-# -------------------------------------------------------
+# ---------------------------------------------------------
 
 pl.figure(4, (8, 8))
 
diff --git a/examples/barycenters/plot_barycenter_1D.py b/examples/barycenters/plot_barycenter_1D.py
index 8096245..40dc444 100644
--- a/examples/barycenters/plot_barycenter_1D.py
+++ b/examples/barycenters/plot_barycenter_1D.py
@@ -4,7 +4,7 @@
 1D Wasserstein barycenter demo
 ==============================
 
-This example illustrates the computation of regularized Wassersyein Barycenter
+This example illustrates the computation of regularized Wasserstein Barycenter
 as proposed in [3].
 
 
@@ -80,6 +80,7 @@ plt.show()
 ##############################################################################
 # Barycentric interpolation
 # -------------------------
+
 #%% barycenter interpolation
 
 n_alpha = 11
diff --git a/examples/barycenters/plot_free_support_barycenter.py b/examples/barycenters/plot_free_support_barycenter.py
index f4a13dd..b6a4a11 100644
--- a/examples/barycenters/plot_free_support_barycenter.py
+++ b/examples/barycenters/plot_free_support_barycenter.py
@@ -5,7 +5,7 @@
 ========================================================
 
 Illustration of 2D Wasserstein and Sinkhorn barycenters if distributions are weighted
-sum of diracs.
+sum of Diracs.
 
 """
 
@@ -50,7 +50,7 @@ pl.title('Distributions')
 
 # %%
 # Compute free support Wasserstein barycenter
-# -------------------------------
+# -------------------------------------------
 
 k = 200  # number of Diracs of the barycenter
 X_init = np.random.normal(0., 1., (k, d))  # initial Dirac locations
@@ -60,7 +60,7 @@ X = ot.lp.free_support_barycenter(measures_locations, measures_weights, X_init,
 
 # %%
 # Plot the Wasserstein barycenter
-# ---------
+# -------------------------------
 
 pl.figure(2, (8, 3))
 pl.scatter(x1[:, 0], x1[:, 1], alpha=0.5)
@@ -81,7 +81,7 @@ X = ot.bregman.free_support_sinkhorn_barycenter(measures_locations, measures_wei
 
 # %%
 # Plot the Wasserstein barycenter
-# ---------
+# -------------------------------
 
 pl.figure(2, (8, 3))
 pl.scatter(x1[:, 0], x1[:, 1], alpha=0.5)
diff --git a/examples/barycenters/plot_generalized_free_support_barycenter.py b/examples/barycenters/plot_generalized_free_support_barycenter.py
index e685ec7..a4d081b 100644
--- a/examples/barycenters/plot_generalized_free_support_barycenter.py
+++ b/examples/barycenters/plot_generalized_free_support_barycenter.py
@@ -57,7 +57,7 @@ weights = np.array([1 / 3, 1 / 3, 1 / 3])
 # Number of barycenter points to compute
 n_samples_bary = 150
 
-# Send the input measures into 3D space for visualisation
+# Send the input measures into 3D space for visualization
 X_visu = [Xi @ Pi for (Xi, Pi) in zip(X_list, P_list)]
 
 # Plot the input data
diff --git a/examples/domain-adaptation/plot_otda_semi_supervised.py b/examples/domain-adaptation/plot_otda_semi_supervised.py
index 478c3b8..278c8dd 100644
--- a/examples/domain-adaptation/plot_otda_semi_supervised.py
+++ b/examples/domain-adaptation/plot_otda_semi_supervised.py
@@ -50,7 +50,7 @@ ot_sinkhorn_semi = ot.da.SinkhornTransport(reg_e=1e-1)
 ot_sinkhorn_semi.fit(Xs=Xs, Xt=Xt, ys=ys, yt=yt)
 transp_Xs_sinkhorn_semi = ot_sinkhorn_semi.transform(Xs=Xs)
 
-# semi supervised DA uses available labaled target samples to modify the cost
+# semi supervised DA uses available labeled target samples to modify the cost
 # matrix involved in the OT problem. The cost of transporting a source sample
 # of class A onto a target sample of class B != A is set to infinite, or a
 # very large value
@@ -92,7 +92,7 @@ pl.subplot(2, 2, 4)
 pl.imshow(ot_sinkhorn_semi.cost_, interpolation='nearest')
 pl.xticks([])
 pl.yticks([])
-pl.title('Cost matrix - semisupervised DA')
+pl.title('Cost matrix - semi-supervised DA')
 
 pl.tight_layout()
 
diff --git a/examples/gromov/plot_barycenter_fgw.py b/examples/gromov/plot_barycenter_fgw.py
index dc3c6aa..3b5db8b 100644
--- a/examples/gromov/plot_barycenter_fgw.py
+++ b/examples/gromov/plot_barycenter_fgw.py
@@ -34,8 +34,8 @@ from ot.gromov import fgw_barycenters
 
 def find_thresh(C, inf=0.5, sup=3, step=10):
     """ Trick to find the adequate thresholds from where value of the C matrix are considered close enough to say that nodes are connected
-        Tthe threshold is found by a linesearch between values "inf" and "sup" with "step" thresholds tested.
-        The optimal threshold is the one which minimizes the reconstruction error between the shortest_path matrix coming from the thresholded adjency matrix
+        The threshold is found by a linesearch between values "inf" and "sup" with "step" thresholds tested.
+        The optimal threshold is the one which minimizes the reconstruction error between the shortest_path matrix coming from the thresholded adjacency matrix
         and the original matrix.
     Parameters
     ----------
@@ -51,15 +51,15 @@ def find_thresh(C, inf=0.5, sup=3, step=10):
     dist = []
     search = np.linspace(inf, sup, step)
     for thresh in search:
-        Cprime = sp_to_adjency(C, 0, thresh)
+        Cprime = sp_to_adjacency(C, 0, thresh)
         SC = shortest_path(Cprime, method='D')
         SC[SC == float('inf')] = 100
         dist.append(np.linalg.norm(SC - C))
     return search[np.argmin(dist)], dist
 
 
-def sp_to_adjency(C, threshinf=0.2, threshsup=1.8):
-    """ Thresholds the structure matrix in order to compute an adjency matrix.
+def sp_to_adjacency(C, threshinf=0.2, threshsup=1.8):
+    """ Thresholds the structure matrix in order to compute an adjacency matrix.
     All values between threshinf and threshsup are considered representing connected nodes and set to 1. Else are set to 0
     Parameters
     ----------
@@ -174,7 +174,7 @@ A, C, log = fgw_barycenters(sizebary, Ys, Cs, ps, lambdas, alpha=0.95, log=True)
 # -------------------------
 
 #%% Create the barycenter
-bary = nx.from_numpy_array(sp_to_adjency(C, threshinf=0, threshsup=find_thresh(C, sup=100, step=100)[0]))
+bary = nx.from_numpy_array(sp_to_adjacency(C, threshinf=0, threshsup=find_thresh(C, sup=100, step=100)[0]))
 for i, v in enumerate(A.ravel()):
     bary.add_node(i, attr_name=v)
 
diff --git a/examples/gromov/plot_fgw.py b/examples/gromov/plot_fgw.py
index 5475fb3..bf10de6 100644
--- a/examples/gromov/plot_fgw.py
+++ b/examples/gromov/plot_fgw.py
@@ -1,7 +1,7 @@
 # -*- coding: utf-8 -*-
 """
 ==============================
-Plot Fused-gromov-Wasserstein
+Plot Fused-Gromov-Wasserstein
 ==============================
 
 This example illustrates the computation of FGW for 1D measures [18].
diff --git a/examples/gromov/plot_gromov.py b/examples/gromov/plot_gromov.py
index 05074dc..afb5bdc 100644
--- a/examples/gromov/plot_gromov.py
+++ b/examples/gromov/plot_gromov.py
@@ -3,7 +3,7 @@
 ==========================
 Gromov-Wasserstein example
 ==========================
-This example is designed to show how to use the Gromov-Wassertsein distance
+This example is designed to show how to use the Gromov-Wasserstein distance
 computation in POT.
 """
 
diff --git a/examples/gromov/plot_gromov_barycenter.py b/examples/gromov/plot_gromov_barycenter.py
index 08ec610..1b9abbf 100755
--- a/examples/gromov/plot_gromov_barycenter.py
+++ b/examples/gromov/plot_gromov_barycenter.py
@@ -36,7 +36,7 @@ import ot
 def smacof_mds(C, dim, max_iter=3000, eps=1e-9):
     """
     Returns an interpolated point cloud following the dissimilarity matrix C
-    using SMACOF multidimensional scaling (MDS) in specific dimensionned
+    using SMACOF multidimensional scaling (MDS) in specific dimensioned
     target space
 
     Parameters
diff --git a/examples/gromov/plot_gromov_wasserstein_dictionary_learning.py b/examples/gromov/plot_gromov_wasserstein_dictionary_learning.py
index 7585944..8cccf88 100755
--- a/examples/gromov/plot_gromov_wasserstein_dictionary_learning.py
+++ b/examples/gromov/plot_gromov_wasserstein_dictionary_learning.py
@@ -1,11 +1,11 @@
 # -*- coding: utf-8 -*-
 
 r"""
-=================================
+=====================================================
 (Fused) Gromov-Wasserstein Linear Dictionary Learning
-=================================
+=====================================================
 
-In this exemple, we illustrate how to learn a Gromov-Wasserstein dictionary on
+In this example, we illustrate how to learn a Gromov-Wasserstein dictionary on
 a dataset of structured data such as graphs, denoted
 :math:`\{ \mathbf{C_s} \}_{s \in [S]}` where every nodes have uniform weights.
 Given a dictionary :math:`\mathbf{C_{dict}}` composed of D structures of a fixed
@@ -49,7 +49,7 @@ from networkx.generators.community import stochastic_block_model as sbm
 #############################################################################
 #
 # Generate a dataset composed of graphs following Stochastic Block models of 1, 2 and 3 clusters.
-# ---------------------------------------------
+# -----------------------------------------------------------------------------------------------
 
 np.random.seed(42)
 
@@ -112,8 +112,8 @@ pl.show()
 
 #############################################################################
 #
-# Estimate the gromov-wasserstein dictionary from the dataset
-# ---------------------------------------------
+# Estimate the Gromov-Wasserstein dictionary from the dataset
+# -----------------------------------------------------------
 
 
 np.random.seed(0)
@@ -144,7 +144,7 @@ pl.show()
 #############################################################################
 #
 # Visualization of the estimated dictionary atoms
-# ---------------------------------------------
+# -----------------------------------------------
 
 
 # Continuous connections between nodes of the atoms are colored in shades of grey (1: dark / 2: white)
@@ -169,7 +169,7 @@ pl.show()
 #############################################################################
 #
 # Visualization of the embedding space
-# ---------------------------------------------
+# ------------------------------------
 
 unmixings = []
 reconstruction_errors = []
@@ -217,7 +217,7 @@ pl.show()
 #############################################################################
 #
 # Endow the dataset with node features
-# ---------------------------------------------
+# ------------------------------------
 # We follow this feature assignment on all nodes of a graph depending on its label/number of clusters
 # 1 cluster --> 0 as nodes feature
 # 2 clusters --> 1 as nodes feature
@@ -257,7 +257,7 @@ pl.show()
 #############################################################################
 #
 # Estimate a Fused Gromov-Wasserstein dictionary from the dataset of attributed graphs
-# ---------------------------------------------
+# ------------------------------------------------------------------------------------
 np.random.seed(0)
 ps = [ot.unif(C.shape[0]) for C in dataset]
 D = 3  # 6 atoms instead of 3
@@ -286,7 +286,7 @@ pl.show()
 #############################################################################
 #
 # Visualization of the estimated dictionary atoms
-# ---------------------------------------------
+# -----------------------------------------------
 
 pl.figure(7, (12, 8))
 pl.clf()
@@ -313,7 +313,7 @@ pl.show()
 #############################################################################
 #
 # Visualization of the embedding space
-# ---------------------------------------------
+# ------------------------------------
 
 unmixings = []
 reconstruction_errors = []
diff --git a/examples/gromov/plot_semirelaxed_fgw.py b/examples/gromov/plot_semirelaxed_fgw.py
index ef4b286..579f23d 100644
--- a/examples/gromov/plot_semirelaxed_fgw.py
+++ b/examples/gromov/plot_semirelaxed_fgw.py
@@ -1,8 +1,8 @@
 # -*- coding: utf-8 -*-
 """
-==========================
+===============================================
 Semi-relaxed (Fused) Gromov-Wasserstein example
-==========================
+===============================================
 
 This example is designed to show how to use the semi-relaxed Gromov-Wasserstein
 and the semi-relaxed Fused Gromov-Wasserstein divergences.
@@ -34,7 +34,7 @@ from networkx.generators.community import stochastic_block_model as sbm
 #############################################################################
 #
 # Generate two graphs following Stochastic Block models of 2 and 3 clusters.
-# ---------------------------------------------
+# --------------------------------------------------------------------------
 
 
 N2 = 20  # 2 communities
@@ -85,7 +85,7 @@ for i, j in G3.edges():
 #############################################################################
 #
 # Compute their semi-relaxed Gromov-Wasserstein divergences
-# ---------------------------------------------
+# ---------------------------------------------------------
 
 # 0) GW(C2, h2, C3, h3) for reference
 OT, log = gromov_wasserstein(C2, C3, h2, h3, symmetric=True, log=True)
@@ -110,7 +110,7 @@ print('srGW(C3, h3, C2) = ', srgw_32)
 #############################################################################
 #
 # Visualization of the semi-relaxed Gromov-Wasserstein matchings
-# ---------------------------------------------
+# --------------------------------------------------------------
 #
 # We color nodes of the graph on the right - then project its node colors
 # based on the optimal transport plan from the srGW matching
@@ -226,7 +226,7 @@ pl.show()
 #############################################################################
 #
 # Add node features
-# ---------------------------------------------
+# -----------------
 
 # We add node features with given mean - by clusters
 # and inversely proportional to clusters' intra-connectivity
@@ -242,7 +242,7 @@ for i, c in enumerate(part_G3):
 #############################################################################
 #
 # Compute their semi-relaxed Fused Gromov-Wasserstein divergences
-# ---------------------------------------------
+# ---------------------------------------------------------------
 
 alpha = 0.5
 # Compute pairwise euclidean distance between node features
@@ -272,7 +272,7 @@ print('srGW(C3, F3, h3, C2, F2) = ', srfgw_32)
 #############################################################################
 #
 # Visualization of the semi-relaxed Fused Gromov-Wasserstein matchings
-# ---------------------------------------------
+# --------------------------------------------------------------------
 #
 # We color nodes of the graph on the right - then project its node colors
 # based on the optimal transport plan from the srFGW matching
diff --git a/examples/others/plot_WeakOT_VS_OT.py b/examples/others/plot_WeakOT_VS_OT.py
index a29c875..e3164ba 100644
--- a/examples/others/plot_WeakOT_VS_OT.py
+++ b/examples/others/plot_WeakOT_VS_OT.py
@@ -5,7 +5,7 @@ Weak Optimal Transport VS exact Optimal Transport
 ====================================================
 
 Illustration of 2D optimal transport between distributions that are weighted
-sum of diracs. The OT matrix is plotted with the samples.
+sum of Diracs. The OT matrix is plotted with the samples.
 
 """
 
diff --git a/examples/others/plot_factored_coupling.py b/examples/others/plot_factored_coupling.py
index b5b1c9f..02074d7 100644
--- a/examples/others/plot_factored_coupling.py
+++ b/examples/others/plot_factored_coupling.py
@@ -47,8 +47,8 @@ pl.title('Source and target distributions')
 
 
 # %%
-# Compute Factore OT and exact OT solutions
-# --------------------------------------
+# Compute Factored OT and exact OT solutions
+# ------------------------------------------
 
 #%% EMD
 M = ot.dist(xs, xt)
@@ -61,7 +61,7 @@ Ga, Gb, xb = ot.factored_optimal_transport(xs, xt, a, b, r=4)
 
 # %%
 # Plot factored OT and exact OT solutions
-# --------------------------------------
+# ---------------------------------------
 
 pl.figure(2, (14, 4))
 
diff --git a/examples/others/plot_logo.py b/examples/others/plot_logo.py
index bb4f640..b032801 100644
--- a/examples/others/plot_logo.py
+++ b/examples/others/plot_logo.py
@@ -8,7 +8,7 @@ Logo of the POT toolbox
 In this example we plot the logo of the POT toolbox.
 
 This logo is that it is done 100% in Python and generated using
-matplotlib and ploting teh solution of the EMD solver from POT.
+matplotlib and plotting the solution of the EMD solver from POT.
 
 """
 
diff --git a/examples/others/plot_screenkhorn_1D.py b/examples/others/plot_screenkhorn_1D.py
index 2023649..3640b88 100644
--- a/examples/others/plot_screenkhorn_1D.py
+++ b/examples/others/plot_screenkhorn_1D.py
@@ -62,8 +62,8 @@ ot.plot.plot1D_mat(a, b, M, 'Cost matrix M')
 
 # Screenkhorn
 lambd = 2e-03  # entropy parameter
-ns_budget = 30  # budget number of points to be keeped in the source distribution
-nt_budget = 30  # budget number of points to be keeped in the target distribution
+ns_budget = 30  # budget number of points to be kept in the source distribution
+nt_budget = 30  # budget number of points to be kept in the target distribution
 
 G_screen = screenkhorn(a, b, M, lambd, ns_budget, nt_budget, uniform=False, restricted=True, verbose=True)
 pl.figure(4, figsize=(5, 5))
diff --git a/examples/others/plot_stochastic.py b/examples/others/plot_stochastic.py
index 3a1ef31..f3afb0b 100644
--- a/examples/others/plot_stochastic.py
+++ b/examples/others/plot_stochastic.py
@@ -3,7 +3,7 @@
 Stochastic examples
 ===================
 
-This example is designed to show how to use the stochatic optimization
+This example is designed to show how to use the stochastic optimization
 algorithms for discrete and semi-continuous measures from the POT library.
 
 [18] Genevay, A., Cuturi, M., Peyré, G. & Bach, F.
@@ -61,7 +61,7 @@ print(sag_pi)
 # Semi-Continuous Case
 # ````````````````````
 #
-# Sample one general measure a, one discrete measures b for the semicontinous
+# Sample one general measure a, one discrete measures b for the semicontinuous
 # case, the points where source and target measures are defined and compute the
 # cost matrix.
 
@@ -80,7 +80,7 @@ Y_target = rng.randn(n_target, 2)
 M = ot.dist(X_source, Y_target)
 
 #############################################################################
-# Call the "ASGD" method to find the transportation matrix in the semicontinous
+# Call the "ASGD" method to find the transportation matrix in the semicontinuous
 # case.
 
 method = "ASGD"
diff --git a/examples/plot_Intro_OT.py b/examples/plot_Intro_OT.py
index 219aa51..1c51360 100644
--- a/examples/plot_Intro_OT.py
+++ b/examples/plot_Intro_OT.py
@@ -67,7 +67,7 @@ help(ot.dist)
 # We extracted from this search their positions and generated fictional
 # production and sale number (that both sum to the same value).
 #
-# We have acess to the position of Bakeries ``bakery_pos`` and their
+# We have access to the position of Bakeries ``bakery_pos`` and their
 # respective production ``bakery_prod`` which describe the source
 # distribution. The Cafés where the croissants are sold are defined also by
 # their position ``cafe_pos`` and ``cafe_prod``, and describe the target
@@ -166,10 +166,10 @@ time_emd = time.time() - start
 # The function returns the transport matrix, which we can then visualize (next section).
 
 ##############################################################################
-# Transportation plan vizualization
+# Transportation plan visualization
 # `````````````````````````````````
 #
-# A good vizualization of the OT matrix in the 2D plane is to denote the
+# A good visualization of the OT matrix in the 2D plane is to denote the
 # transportation of mass between a Bakery and a Café by a line. This can easily
 # be done with a double ``for`` loop.
 #
diff --git a/examples/plot_OT_1D_smooth.py b/examples/plot_OT_1D_smooth.py
index ff51b8a..626938c 100644
--- a/examples/plot_OT_1D_smooth.py
+++ b/examples/plot_OT_1D_smooth.py
@@ -94,6 +94,6 @@ max_nz = 2  # two non-zero entries are permitted per column of the OT plan
 Gsc = ot.smooth.smooth_ot_dual(
     a, b, M, lambd, reg_type='sparsity_constrained', max_nz=max_nz)
 pl.figure(5, figsize=(5, 5))
-ot.plot.plot1D_mat(a, b, Gsc, 'Sparsity contrained OT matrix; k=2.')
+ot.plot.plot1D_mat(a, b, Gsc, 'Sparsity constrained OT matrix; k=2.')
 
 pl.show()
diff --git a/examples/plot_OT_2D_samples.py b/examples/plot_OT_2D_samples.py
index 1d82fb8..4b98892 100644
--- a/examples/plot_OT_2D_samples.py
+++ b/examples/plot_OT_2D_samples.py
@@ -4,8 +4,8 @@
 Optimal Transport between 2D empirical distributions
 ====================================================
 
-Illustration of 2D optimal transport between discributions that are weighted
-sum of diracs. The OT matrix is plotted with the samples.
+Illustration of 2D optimal transport between distributions that are weighted
+sum of Diracs. The OT matrix is plotted with the samples.
 
 """
 
@@ -105,7 +105,7 @@ pl.show()
 
 
 ##############################################################################
-# Emprirical Sinkhorn
+# Empirical Sinkhorn
 # -------------------
 
 #%% sinkhorn
diff --git a/examples/plot_OT_L1_vs_L2.py b/examples/plot_OT_L1_vs_L2.py
index 7a08197..e1d102c 100644
--- a/examples/plot_OT_L1_vs_L2.py
+++ b/examples/plot_OT_L1_vs_L2.py
@@ -4,7 +4,7 @@
 Optimal Transport with different ground metrics
 ================================================
 
-2D OT on empirical distributio with different ground metric.
+2D OT on empirical distribution with different ground metric.
 
 Stole the figure idea from Fig. 1 and 2 in
 https://arxiv.org/pdf/1706.07650.pdf
diff --git a/examples/plot_compute_emd.py b/examples/plot_compute_emd.py
index 36cc7da..32d63e8 100644
--- a/examples/plot_compute_emd.py
+++ b/examples/plot_compute_emd.py
@@ -4,7 +4,7 @@
 OT distances in 1D
 ==================
 
-Shows how to compute multiple Wassersein and Sinkhorn with two different
+Shows how to compute multiple Wasserstein and Sinkhorn with two different
 ground metrics and plot their values for different distributions.
 
 
@@ -76,7 +76,7 @@ pl.tight_layout()
 #%% Compute and plot distributions and loss matrix
 
 d_emd = ot.emd2(a, B, M)  # direct computation of OT loss
-d_emd2 = ot.emd2(a, B, M2)  # direct computation of OT loss with metrixc M2
+d_emd2 = ot.emd2(a, B, M2)  # direct computation of OT loss with metric M2
 d_tv = [np.sum(abs(a - B[:, i])) for i in range(n_target)]
 
 pl.figure(2)
diff --git a/examples/sliced-wasserstein/plot_variance.py b/examples/sliced-wasserstein/plot_variance.py
index f12b522..2293247 100644
--- a/examples/sliced-wasserstein/plot_variance.py
+++ b/examples/sliced-wasserstein/plot_variance.py
@@ -83,6 +83,6 @@ pl.xscale('log')
 
 pl.xlabel("Number of projections")
 pl.ylabel("Distance")
-pl.title('Sliced Wasserstein Distance with 95% confidence inverval')
+pl.title('Sliced Wasserstein Distance with 95% confidence interval')
 
 pl.show()
diff --git a/examples/sliced-wasserstein/plot_variance_ssw.py b/examples/sliced-wasserstein/plot_variance_ssw.py
index 83d458f..f5fc35f 100644
--- a/examples/sliced-wasserstein/plot_variance_ssw.py
+++ b/examples/sliced-wasserstein/plot_variance_ssw.py
@@ -106,6 +106,6 @@ pl.xscale('log')
 
 pl.xlabel("Number of projections")
 pl.ylabel("Distance")
-pl.title('Spherical Sliced Wasserstein Distance with 95% confidence inverval')
+pl.title('Spherical Sliced Wasserstein Distance with 95% confidence interval')
 
 pl.show()
diff --git a/examples/unbalanced-partial/plot_UOT_barycenter_1D.py b/examples/unbalanced-partial/plot_UOT_barycenter_1D.py
index 8d227c0..f747055 100644
--- a/examples/unbalanced-partial/plot_UOT_barycenter_1D.py
+++ b/examples/unbalanced-partial/plot_UOT_barycenter_1D.py
@@ -4,7 +4,7 @@
 1D Wasserstein barycenter demo for Unbalanced distributions
 ===========================================================
 
-This example illustrates the computation of regularized Wassersyein Barycenter
+This example illustrates the computation of regularized Wasserstein Barycenter
 as proposed in [10] for Unbalanced inputs.
 
 
diff --git a/examples/unbalanced-partial/plot_regpath.py b/examples/unbalanced-partial/plot_regpath.py
index 782e8c2..d1f2042 100644
--- a/examples/unbalanced-partial/plot_regpath.py
+++ b/examples/unbalanced-partial/plot_regpath.py
@@ -60,7 +60,7 @@ pl.show()
 
 ##############################################################################
 # Compute semi-relaxed and fully relaxed regularization paths
-# -----------
+# -----------------------------------------------------------
 
 #%%
 final_gamma = 1e-8
@@ -72,9 +72,9 @@ t2, t_list2, g_list2 = ot.regpath.regularization_path(a, b, M, reg=final_gamma,
 
 ##############################################################################
 # Plot the regularization path
-# ----------------
+# ----------------------------
 #
-# The OT plan is ploted as a function of $\gamma$ that is the inverse of the
+# The OT plan is plotted as a function of $\gamma$ that is the inverse of the
 # weight on the marginal relaxations.
 
 #%% fully relaxed l2-penalized UOT
@@ -109,7 +109,7 @@ pl.show()
 
 # %%
 # Animation of the regpath for UOT l2
-# ------------------------
+# -----------------------------------
 
 nv = 100
 g_list_v = np.logspace(-.5, -2.5, nv)
@@ -149,7 +149,7 @@ ani = animation.FuncAnimation(pl.gcf(), _update_plot, nv, interval=50, repeat_de
 
 ##############################################################################
 # Plot the semi-relaxed regularization path
-# -------------------
+# -----------------------------------------
 
 #%% semi-relaxed l2-penalized UOT
 
@@ -181,7 +181,7 @@ pl.show()
 
 # %%
 # Animation of the regpath for semi-relaxed UOT l2
-# ------------------------
+# ------------------------------------------------
 
 nv = 100
 g_list_v = np.logspace(2.5, -2, nv)
diff --git a/ot/backend.py b/ot/backend.py
index 0dd6fb8..a82c448 100644
--- a/ot/backend.py
+++ b/ot/backend.py
@@ -27,7 +27,7 @@ Examples
         np_config.enable_numpy_behavior()
 
 Performance
---------
+-----------
 
 - CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz
 - GPU: Tesla V100-SXM2-32GB
diff --git a/ot/bregman.py b/ot/bregman.py
index 20bef7e..4503ffc 100644
--- a/ot/bregman.py
+++ b/ot/bregman.py
@@ -150,7 +150,7 @@ def sinkhorn(a, b, M, reg, method='sinkhorn', numItermax=1000, stopThr=1e-9,
     ot.bregman.sinkhorn_knopp : Classic Sinkhorn :ref:`[2] <references-sinkhorn>`
     ot.bregman.sinkhorn_stabilized: Stabilized sinkhorn
         :ref:`[9] <references-sinkhorn>` :ref:`[10] <references-sinkhorn>`
-    ot.bregman.sinkhorn_epsilon_scaling: Sinkhorn with epslilon scaling
+    ot.bregman.sinkhorn_epsilon_scaling: Sinkhorn with epsilon scaling
         :ref:`[9] <references-sinkhorn>` :ref:`[10] <references-sinkhorn>`
 
     """
@@ -384,6 +384,7 @@ def sinkhorn_knopp(a, b, M, reg, numItermax=1000, stopThr=1e-9,
              \gamma^T \mathbf{1} &= \mathbf{b}
 
              \gamma &\geq 0
+
     where :
 
     - :math:`\mathbf{M}` is the (`dim_a`, `dim_b`) metric cost matrix
@@ -572,6 +573,7 @@ def sinkhorn_log(a, b, M, reg, numItermax=1000, stopThr=1e-9, verbose=False,
              \gamma^T \mathbf{1} &= \mathbf{b}
 
              \gamma &\geq 0
+
     where :
 
     - :math:`\mathbf{M}` is the (`dim_a`, `dim_b`) metric cost matrix
@@ -784,6 +786,7 @@ def greenkhorn(a, b, M, reg, numItermax=10000, stopThr=1e-9, verbose=False,
              \gamma^T \mathbf{1} &= \mathbf{b}
 
              \gamma &\geq 0
+
     where :
 
     - :math:`\mathbf{M}` is the (`dim_a`, `dim_b`) metric cost matrix
@@ -950,6 +953,7 @@ def sinkhorn_stabilized(a, b, M, reg, numItermax=1000, tau=1e3, stopThr=1e-9,
              \gamma^T \mathbf{1} &= \mathbf{b}
 
              \gamma &\geq 0
+
     where :
 
     - :math:`\mathbf{M}` is the (`dim_a`, `dim_b`) metric cost matrix
@@ -2657,7 +2661,7 @@ def unmix(a, D, M, M0, h0, reg, reg0, alpha, numItermax=1000,
     ----------
 
     .. [4] S. Nakhostin, N. Courty, R. Flamary, D. Tuia, T. Corpetti,
-        Supervised planetary unmixing with optimal transport, Whorkshop
+        Supervised planetary unmixing with optimal transport, Workshop
         on Hyperspectral Image and Signal Processing :
         Evolution in Remote Sensing (WHISPERS), 2016.
     """
@@ -2908,6 +2912,7 @@ def empirical_sinkhorn(X_s, X_t, reg, a=None, b=None, metric='sqeuclidean',
              \gamma^T \mathbf{1} &= \mathbf{b}
 
              \gamma &\geq 0
+
     where :
 
     - :math:`\mathbf{M}` is the (`n_samples_a`, `n_samples_b`) metric cost matrix
@@ -3104,6 +3109,7 @@ def empirical_sinkhorn2(X_s, X_t, reg, a=None, b=None, metric='sqeuclidean',
              \gamma^T \mathbf{1} &= \mathbf{b}
 
              \gamma &\geq 0
+
     where :
 
     - :math:`\mathbf{M}` is the (`n_samples_a`, `n_samples_b`) metric cost matrix
@@ -3257,7 +3263,6 @@ def empirical_sinkhorn_divergence(X_s, X_t, reg, a=None, b=None, metric='sqeucli
     sinkhorn divergence :math:`S`:
 
     .. math::
-
         W &= \min_\gamma \quad \langle \gamma, \mathbf{M} \rangle_F +
         \mathrm{reg} \cdot\Omega(\gamma)
 
@@ -3287,6 +3292,7 @@ def empirical_sinkhorn_divergence(X_s, X_t, reg, a=None, b=None, metric='sqeucli
              \gamma_b^T \mathbf{1} &= \mathbf{b}
 
              \gamma_b &\geq 0
+
     where :
 
     - :math:`\mathbf{M}` (resp. :math:`\mathbf{M_a}`, :math:`\mathbf{M_b}`)
@@ -3352,7 +3358,7 @@ def empirical_sinkhorn_divergence(X_s, X_t, reg, a=None, b=None, metric='sqeucli
     ----------
     .. [23] Aude Genevay, Gabriel Peyré, Marco Cuturi, Learning Generative
         Models with Sinkhorn Divergences,  Proceedings of the Twenty-First
-        International Conference on Artficial Intelligence and Statistics,
+        International Conference on Artificial Intelligence and Statistics,
         (AISTATS) 21, 2018
     '''
     X_s, X_t = list_to_array(X_s, X_t)
diff --git a/ot/coot.py b/ot/coot.py
index 66dd2c8..477529f 100644
--- a/ot/coot.py
+++ b/ot/coot.py
@@ -74,7 +74,7 @@ def co_optimal_transport(X, Y, wx_samp=None, wx_feat=None, wy_samp=None, wy_feat
         Sinkhorn solver. If epsilon is scalar, then the same epsilon is applied to
         both regularization of sample and feature couplings.
     alpha : scalar or indexable object of length 2, float or int, optional (default = 0)
-        Coeffficient parameter of linear terms with respect to the sample and feature couplings.
+        Coefficient parameter of linear terms with respect to the sample and feature couplings.
         If alpha is scalar, then the same alpha is applied to both linear terms.
     M_samp : (n_sample_x, n_sample_y), float, optional (default = None)
         Sample matrix with respect to the linear term on sample coupling.
@@ -295,7 +295,7 @@ def co_optimal_transport2(X, Y, wx_samp=None, wx_feat=None, wy_samp=None, wy_fea
         + \varepsilon_1 \mathbf{KL}(\mathbf{P} | \mathbf{w}_{xs} \mathbf{w}_{ys}^T)
         + \varepsilon_2 \mathbf{KL}(\mathbf{Q} | \mathbf{w}_{xf} \mathbf{w}_{yf}^T)
 
-    Where :
+    where :
 
     - :math:`\mathbf{X}`: Data matrix in the source space
     - :math:`\mathbf{Y}`: Data matrix in the target space
@@ -333,7 +333,7 @@ def co_optimal_transport2(X, Y, wx_samp=None, wx_feat=None, wy_samp=None, wy_fea
         Sinkhorn solver. If epsilon is scalar, then the same epsilon is applied to
         both regularization of sample and feature couplings.
     alpha : scalar or indexable object of length 2, float or int, optional (default = 0)
-        Coeffficient parameter of linear terms with respect to the sample and feature couplings.
+        Coefficient parameter of linear terms with respect to the sample and feature couplings.
         If alpha is scalar, then the same alpha is applied to both linear terms.
     M_samp : (n_sample_x, n_sample_y), float, optional (default = None)
         Sample matrix with respect to the linear term on sample coupling.
@@ -345,7 +345,6 @@ def co_optimal_transport2(X, Y, wx_samp=None, wx_feat=None, wy_samp=None, wy_fea
             tuples of 2 vectors of size (n_sample_x, n_sample_y) and (n_feature_x, n_feature_y).
             Initialization of sample and feature dual vectors
             if using Sinkhorn algorithm. Zero vectors by default.
-
             - "pi_sample" and "pi_feature" whose values are matrices
             of size (n_sample_x, n_sample_y) and (n_feature_x, n_feature_y).
             Initialization of sample and feature couplings.
@@ -382,7 +381,7 @@ def co_optimal_transport2(X, Y, wx_samp=None, wx_feat=None, wy_samp=None, wy_fea
     float
         CO-Optimal Transport distance.
     dict
-        Contains logged informations from :any:`co_optimal_transport` solver.
+        Contains logged information from :any:`co_optimal_transport` solver.
         Only returned if `log` parameter is True
 
     References
diff --git a/ot/da.py b/ot/da.py
index 5067a69..886b7ee 100644
--- a/ot/da.py
+++ b/ot/da.py
@@ -28,7 +28,7 @@ def sinkhorn_lpl1_mm(a, labels_a, b, M, reg, eta=0.1, numItermax=10,
                      numInnerItermax=200, stopInnerThr=1e-9, verbose=False,
                      log=False):
     r"""
-    Solve the entropic regularization optimal transport problem with nonconvex
+    Solve the entropic regularization optimal transport problem with non-convex
     group lasso regularization
 
     The function solves the following optimization problem:
@@ -172,13 +172,13 @@ def sinkhorn_l1l2_gl(a, labels_a, b, M, reg, eta=0.1, numItermax=10,
     - :math:`\mathbf{M}` is the (`ns`, `nt`) metric cost matrix
     - :math:`\Omega_e` is the entropic regularization term
       :math:`\Omega_e(\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})`
-    - :math:`\Omega_g` is the group lasso regulaization term
+    - :math:`\Omega_g` is the group lasso regularization term
       :math:`\Omega_g(\gamma)=\sum_{i,c} \|\gamma_{i,\mathcal{I}_c}\|^2`
       where  :math:`\mathcal{I}_c` are the index of samples from class
       `c` in the source domain.
     - :math:`\mathbf{a}` and :math:`\mathbf{b}` are source and target weights (sum to 1)
 
-    The algorithm used for solving the problem is the generalised conditional
+    The algorithm used for solving the problem is the generalized conditional
     gradient as proposed in :ref:`[5, 7] <references-sinkhorn-l1l2-gl>`.
 
 
@@ -296,7 +296,7 @@ def joint_OT_mapping_linear(xs, xt, mu=1, eta=0.001, bias=False, verbose=False,
     material of :ref:`[8] <references-joint-OT-mapping-linear>`) using the bias optional argument.
 
     The algorithm used for solving the problem is the block coordinate
-    descent that alternates between updates of :math:`\mathbf{G}` (using conditionnal gradient)
+    descent that alternates between updates of :math:`\mathbf{G}` (using conditional gradient)
     and the update of :math:`\mathbf{L}` using a classical least square solver.
 
 
@@ -494,7 +494,7 @@ def joint_OT_mapping_kernel(xs, xt, mu=1, eta=0.001, kerneltype='gaussian',
     material of :ref:`[8] <references-joint-OT-mapping-kernel>`) using the bias optional argument.
 
     The algorithm used for solving the problem is the block coordinate
-    descent that alternates between updates of :math:`\mathbf{G}` (using conditionnal gradient)
+    descent that alternates between updates of :math:`\mathbf{G}` (using conditional gradient)
     and the update of :math:`\mathbf{L}` using a classical kernel least square solver.
 
 
diff --git a/ot/datasets.py b/ot/datasets.py
index a839074..3d633f4 100644
--- a/ot/datasets.py
+++ b/ot/datasets.py
@@ -22,7 +22,7 @@ def make_1D_gauss(n, m, s):
     m : float
         mean value of the gaussian distribution
     s : float
-        standard deviaton of the gaussian distribution
+        standard deviation of the gaussian distribution
 
     Returns
     -------
diff --git a/ot/dr.py b/ot/dr.py
index b92cd14..47c8733 100644
--- a/ot/dr.py
+++ b/ot/dr.py
@@ -5,7 +5,7 @@ Dimension reduction with OT
 
 .. warning::
     Note that by default the module is not imported in :mod:`ot`. In order to
-    use it you need to explicitely import :mod:`ot.dr`
+    use it you need to explicitly import :mod:`ot.dr`
 
 """
 
@@ -83,7 +83,7 @@ def fda(X, y, p=2, reg=1e-16):
     y : ndarray, shape (n,)
         Labels for training samples.
     p : int, optional
-        Size of dimensionnality reduction.
+        Size of dimensionality reduction.
     reg : float, optional
         Regularization term >0 (ridge regularization)
 
@@ -164,7 +164,7 @@ def wda(X, y, p=2, reg=1, k=10, solver=None, sinkhorn_method='sinkhorn', maxiter
     y : ndarray, shape (n,)
         Labels for training samples.
     p : int, optional
-        Size of dimensionnality reduction.
+        Size of dimensionality reduction.
     reg : float, optional
         Regularization term >0 (entropic regularization)
     solver : None | str, optional
@@ -175,7 +175,7 @@ def wda(X, y, p=2, reg=1, k=10, solver=None, sinkhorn_method='sinkhorn', maxiter
     P0 : ndarray, shape (d, p)
         Initial starting point for projection.
     normalize : bool, optional
-        Normalise the Wasserstaiun distance by the average distance on P0 (default : False)
+        Normalize the Wasserstaiun distance by the average distance on P0 (default : False)
     verbose : int, optional
         Print information along iterations.
 
diff --git a/ot/gromov/_bregman.py b/ot/gromov/_bregman.py
index b0cccfb..aa25f1f 100644
--- a/ot/gromov/_bregman.py
+++ b/ot/gromov/_bregman.py
@@ -69,7 +69,7 @@ def entropic_gromov_wasserstein(C1, C2, p, q, loss_fun, epsilon, symmetric=None,
     symmetric : bool, optional
         Either C1 and C2 are to be assumed symmetric or not.
         If let to its default None value, a symmetry test will be conducted.
-        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymetric).
+        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).
     G0: array-like, shape (ns,nt), optional
         If None the initial transport plan of the solver is pq^T.
         Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.
@@ -152,7 +152,7 @@ def entropic_gromov_wasserstein(C1, C2, p, q, loss_fun, epsilon, symmetric=None,
 def entropic_gromov_wasserstein2(C1, C2, p, q, loss_fun, epsilon, symmetric=None, G0=None,
                                  max_iter=1000, tol=1e-9, verbose=False, log=False):
     r"""
-    Returns the entropic gromov-wasserstein discrepancy between the two measured similarity matrices :math:`(\mathbf{C_1}, \mathbf{p})` and :math:`(\mathbf{C_2}, \mathbf{q})`
+    Returns the entropic Gromov-Wasserstein discrepancy between the two measured similarity matrices :math:`(\mathbf{C_1}, \mathbf{p})` and :math:`(\mathbf{C_2}, \mathbf{q})`
 
     The function solves the following optimization problem:
 
@@ -194,7 +194,7 @@ def entropic_gromov_wasserstein2(C1, C2, p, q, loss_fun, epsilon, symmetric=None
     symmetric : bool, optional
         Either C1 and C2 are to be assumed symmetric or not.
         If let to its default None value, a symmetry test will be conducted.
-        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymetric).
+        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).
     G0: array-like, shape (ns,nt), optional
         If None the initial transport plan of the solver is pq^T.
         Otherwise G0 must satisfy marginal constraints and will be used as initial transport of the solver.
diff --git a/ot/gromov/_dictionary.py b/ot/gromov/_dictionary.py
index 5b32671..0d618d1 100644
--- a/ot/gromov/_dictionary.py
+++ b/ot/gromov/_dictionary.py
@@ -148,7 +148,7 @@ def gromov_wasserstein_dictionary_learning(Cs, D, nt, reg=0., ps=None, q=None, e
             Ts = [None] * batch_size
 
             for batch_idx, C_idx in enumerate(batch):
-                # BCD solver for Gromov-Wassersteisn linear unmixing used independently on each structure of the sampled batch
+                # BCD solver for Gromov-Wasserstein linear unmixing used independently on each structure of the sampled batch
                 unmixings[batch_idx], Cs_embedded[batch_idx], Ts[batch_idx], current_loss = gromov_wasserstein_linear_unmixing(
                     Cs[C_idx], Cdict, reg=reg, p=ps[C_idx], q=q, tol_outer=tol_outer, tol_inner=tol_inner,
                     max_iter_outer=max_iter_outer, max_iter_inner=max_iter_inner, symmetric=symmetric, **kwargs
@@ -252,7 +252,7 @@ def gromov_wasserstein_linear_unmixing(C, Cdict, reg=0., p=None, q=None, tol_out
     Returns
     -------
     w: array-like, shape (D,)
-        gromov-wasserstein linear unmixing of :math:`(\mathbf{C},\mathbf{p})` onto the span of the dictionary.
+        Gromov-Wasserstein linear unmixing of :math:`(\mathbf{C},\mathbf{p})` onto the span of the dictionary.
     Cembedded: array-like, shape (nt,nt)
         embedded structure of :math:`(\mathbf{C},\mathbf{p})` onto the dictionary, :math:`\sum_d w_d\mathbf{C_{dict}[d]}`.
     T: array-like (ns, nt)
@@ -559,7 +559,7 @@ def fused_gromov_wasserstein_dictionary_learning(Cs, Ys, D, nt, alpha, reg=0., p
         Feature matrices composing the dictionary.
         The dictionary leading to the best loss over an epoch is saved and returned.
     log: dict
-        If use_log is True, contains loss evolutions by batches and epoches.
+        If use_log is True, contains loss evolutions by batches and epochs.
     References
     -------
     .. [38] C. Vincent-Cuaz, T. Vayer, R. Flamary, M. Corneli, N. Courty, Online
@@ -634,7 +634,7 @@ def fused_gromov_wasserstein_dictionary_learning(Cs, Ys, D, nt, alpha, reg=0., p
             Ts = [None] * batch_size
 
             for batch_idx, C_idx in enumerate(batch):
-                # BCD solver for Gromov-Wassersteisn linear unmixing used independently on each structure of the sampled batch
+                # BCD solver for Gromov-Wasserstein linear unmixing used independently on each structure of the sampled batch
                 unmixings[batch_idx], Cs_embedded[batch_idx], Ys_embedded[batch_idx], Ts[batch_idx], current_loss = fused_gromov_wasserstein_linear_unmixing(
                     Cs[C_idx], Ys[C_idx], Cdict, Ydict, alpha, reg=reg, p=ps[C_idx], q=q,
                     tol_outer=tol_outer, tol_inner=tol_inner, max_iter_outer=max_iter_outer, max_iter_inner=max_iter_inner, symmetric=symmetric, **kwargs
@@ -736,7 +736,7 @@ def fused_gromov_wasserstein_linear_unmixing(C, Y, Cdict, Ydict, alpha, reg=0.,
     Returns
     -------
     w: array-like, shape (D,)
-        fused gromov-wasserstein linear unmixing of (C,Y,p) onto the span of the dictionary.
+        fused Gromov-Wasserstein linear unmixing of (C,Y,p) onto the span of the dictionary.
     Cembedded: array-like, shape (nt,nt)
         embedded structure of :math:`(\mathbf{C},\mathbf{Y}, \mathbf{p})` onto the dictionary, :math:`\sum_d w_d\mathbf{C_{dict}[d]}`.
     Yembedded: array-like, shape (nt,d)
diff --git a/ot/gromov/_gw.py b/ot/gromov/_gw.py
index bc4719d..cdfa9a3 100644
--- a/ot/gromov/_gw.py
+++ b/ot/gromov/_gw.py
@@ -26,7 +26,7 @@ from ._utils import update_square_loss, update_kl_loss
 def gromov_wasserstein(C1, C2, p, q, loss_fun='square_loss', symmetric=None, log=False, armijo=False, G0=None,
                        max_iter=1e4, tol_rel=1e-9, tol_abs=1e-9, **kwargs):
     r"""
-    Returns the gromov-wasserstein transport between :math:`(\mathbf{C_1}, \mathbf{p})` and :math:`(\mathbf{C_2}, \mathbf{q})`
+    Returns the Gromov-Wasserstein transport between :math:`(\mathbf{C_1}, \mathbf{p})` and :math:`(\mathbf{C_2}, \mathbf{q})`
 
     The function solves the following optimization problem:
 
@@ -39,6 +39,7 @@ def gromov_wasserstein(C1, C2, p, q, loss_fun='square_loss', symmetric=None, log
              \mathbf{\gamma}^T \mathbf{1} &= \mathbf{q}
 
              \mathbf{\gamma} &\geq 0
+
     Where :
 
     - :math:`\mathbf{C_1}`: Metric cost matrix in the source space
@@ -68,7 +69,7 @@ def gromov_wasserstein(C1, C2, p, q, loss_fun='square_loss', symmetric=None, log
     symmetric : bool, optional
         Either C1 and C2 are to be assumed symmetric or not.
         If let to its default None value, a symmetry test will be conducted.
-        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymetric).
+        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).
     verbose : bool, optional
         Print information along iterations
     log : bool, optional
@@ -170,7 +171,7 @@ def gromov_wasserstein(C1, C2, p, q, loss_fun='square_loss', symmetric=None, log
 def gromov_wasserstein2(C1, C2, p, q, loss_fun='square_loss', symmetric=None, log=False, armijo=False, G0=None,
                         max_iter=1e4, tol_rel=1e-9, tol_abs=1e-9, **kwargs):
     r"""
-    Returns the gromov-wasserstein discrepancy between :math:`(\mathbf{C_1}, \mathbf{p})` and :math:`(\mathbf{C_2}, \mathbf{q})`
+    Returns the Gromov-Wasserstein discrepancy between :math:`(\mathbf{C_1}, \mathbf{p})` and :math:`(\mathbf{C_2}, \mathbf{q})`
 
     The function solves the following optimization problem:
 
@@ -183,6 +184,7 @@ def gromov_wasserstein2(C1, C2, p, q, loss_fun='square_loss', symmetric=None, lo
              \mathbf{\gamma}^T \mathbf{1} &= \mathbf{q}
 
              \mathbf{\gamma} &\geq 0
+
     Where :
 
     - :math:`\mathbf{C_1}`: Metric cost matrix in the source space
@@ -216,7 +218,7 @@ def gromov_wasserstein2(C1, C2, p, q, loss_fun='square_loss', symmetric=None, lo
     symmetric : bool, optional
         Either C1 and C2 are to be assumed symmetric or not.
         If let to its default None value, a symmetry test will be conducted.
-        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymetric).
+        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).
     verbose : bool, optional
         Print information along iterations
     log : bool, optional
@@ -241,7 +243,7 @@ def gromov_wasserstein2(C1, C2, p, q, loss_fun='square_loss', symmetric=None, lo
     gw_dist : float
         Gromov-Wasserstein distance
     log : dict
-        convergence information and Coupling marix
+        convergence information and Coupling matrix
 
     References
     ----------
@@ -310,6 +312,7 @@ def fused_gromov_wasserstein(M, C1, C2, p, q, loss_fun='square_loss', symmetric=
         which can lead to copy overhead on GPU arrays.
     .. note:: All computations in the conjugate gradient solver are done with
         numpy to limit memory overhead.
+
     The algorithm used for solving the problem is conditional gradient as discussed in :ref:`[24] <references-fused-gromov-wasserstein>`
 
     Parameters
@@ -329,7 +332,7 @@ def fused_gromov_wasserstein(M, C1, C2, p, q, loss_fun='square_loss', symmetric=
     symmetric : bool, optional
         Either C1 and C2 are to be assumed symmetric or not.
         If let to its default None value, a symmetry test will be conducted.
-        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymetric).
+        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).
     alpha : float, optional
         Trade-off parameter (0 < alpha < 1)
     armijo : bool, optional
@@ -503,7 +506,7 @@ def fused_gromov_wasserstein2(M, C1, C2, p, q, loss_fun='square_loss', symmetric
     Returns
     -------
     fgw-distance : float
-        Fused gromov wasserstein distance for the given parameters.
+        Fused Gromov-Wasserstein distance for the given parameters.
     log : dict
         Log dictionary return only if log==True in parameters.
 
diff --git a/ot/gromov/_semirelaxed.py b/ot/gromov/_semirelaxed.py
index 638bb1c..cb2bf28 100644
--- a/ot/gromov/_semirelaxed.py
+++ b/ot/gromov/_semirelaxed.py
@@ -21,7 +21,7 @@ from ._utils import init_matrix_semirelaxed, gwloss, gwggrad
 def semirelaxed_gromov_wasserstein(C1, C2, p, loss_fun='square_loss', symmetric=None, log=False, G0=None,
                                    max_iter=1e4, tol_rel=1e-9, tol_abs=1e-9, **kwargs):
     r"""
-    Returns the semi-relaxed gromov-wasserstein divergence transport from :math:`(\mathbf{C_1}, \mathbf{p})` to :math:`\mathbf{C_2}`
+    Returns the semi-relaxed Gromov-Wasserstein divergence transport from :math:`(\mathbf{C_1}, \mathbf{p})` to :math:`\mathbf{C_2}`
 
     The function solves the following optimization problem:
 
@@ -32,6 +32,7 @@ def semirelaxed_gromov_wasserstein(C1, C2, p, loss_fun='square_loss', symmetric=
         s.t. \ \mathbf{\gamma} \mathbf{1} &= \mathbf{p}
 
              \mathbf{\gamma} &\geq 0
+
     Where :
 
     - :math:`\mathbf{C_1}`: Metric cost matrix in the source space
@@ -58,7 +59,7 @@ def semirelaxed_gromov_wasserstein(C1, C2, p, loss_fun='square_loss', symmetric=
     symmetric : bool, optional
         Either C1 and C2 are to be assumed symmetric or not.
         If let to its default None value, a symmetry test will be conducted.
-        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymetric).
+        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).
     verbose : bool, optional
         Print information along iterations
     log : bool, optional
@@ -156,6 +157,7 @@ def semirelaxed_gromov_wasserstein2(C1, C2, p, loss_fun='square_loss', symmetric
         s.t. \ \mathbf{\gamma} \mathbf{1} &= \mathbf{p}
 
              \mathbf{\gamma} &\geq 0
+
     Where :
 
     - :math:`\mathbf{C_1}`: Metric cost matrix in the source space
@@ -166,6 +168,7 @@ def semirelaxed_gromov_wasserstein2(C1, C2, p, loss_fun='square_loss', symmetric
 
     Note that when using backends, this loss function is differentiable wrt the
     matrices (C1, C2) but not yet for the weights p.
+
     .. note:: This function is backend-compatible and will work on arrays
         from all compatible backends. However all the steps in the conditional
         gradient are not differentiable.
@@ -184,7 +187,7 @@ def semirelaxed_gromov_wasserstein2(C1, C2, p, loss_fun='square_loss', symmetric
     symmetric : bool, optional
         Either C1 and C2 are to be assumed symmetric or not.
         If let to its default None value, a symmetry test will be conducted.
-        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymetric).
+        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).
     verbose : bool, optional
         Print information along iterations
     log : bool, optional
@@ -278,7 +281,7 @@ def semirelaxed_fused_gromov_wasserstein(M, C1, C2, p, loss_fun='square_loss', s
     symmetric : bool, optional
         Either C1 and C2 are to be assumed symmetric or not.
         If let to its default None value, a symmetry test will be conducted.
-        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymetric).
+        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).
     alpha : float, optional
         Trade-off parameter (0 < alpha < 1)
     G0: array-like, shape (ns,nt), optional
@@ -415,7 +418,7 @@ def semirelaxed_fused_gromov_wasserstein2(M, C1, C2, p, loss_fun='square_loss',
     symmetric : bool, optional
         Either C1 and C2 are to be assumed symmetric or not.
         If let to its default None value, a symmetry test will be conducted.
-        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymetric).
+        Else if set to True (resp. False), C1 and C2 will be assumed symmetric (resp. asymmetric).
     alpha : float, optional
         Trade-off parameter (0 < alpha < 1)
     G0: array-like, shape (ns,nt), optional
@@ -435,7 +438,7 @@ def semirelaxed_fused_gromov_wasserstein2(M, C1, C2, p, loss_fun='square_loss',
     Returns
     -------
     srfgw-divergence : float
-        Semi-relaxed Fused gromov wasserstein divergence for the given parameters.
+        Semi-relaxed Fused Gromov-Wasserstein divergence for the given parameters.
     log : dict
         Log dictionary return only if log==True in parameters.
 
diff --git a/ot/gromov/_utils.py b/ot/gromov/_utils.py
index e842250..ef8cd88 100644
--- a/ot/gromov/_utils.py
+++ b/ot/gromov/_utils.py
@@ -20,7 +20,7 @@ def init_matrix(C1, C2, p, q, loss_fun='square_loss', nx=None):
     r"""Return loss matrices and tensors for Gromov-Wasserstein fast computation
 
     Returns the value of :math:`\mathcal{L}(\mathbf{C_1}, \mathbf{C_2}) \otimes \mathbf{T}` with the
-    selected loss function as the loss function of Gromow-Wasserstein discrepancy.
+    selected loss function as the loss function of Gromov-Wasserstein discrepancy.
 
     The matrices are computed as described in Proposition 1 in :ref:`[12] <references-init-matrix>`
 
@@ -195,7 +195,7 @@ def gwloss(constC, hC1, hC2, T, nx=None):
     Returns
     -------
     loss : float
-        Gromov Wasserstein loss
+        Gromov-Wasserstein loss
 
 
     .. _references-gwloss:
@@ -235,7 +235,7 @@ def gwggrad(constC, hC1, hC2, T, nx=None):
     Returns
     -------
     grad : array-like, shape (`ns`, `nt`)
-           Gromov Wasserstein gradient
+        Gromov-Wasserstein gradient
 
 
     .. _references-gwggrad:
@@ -328,7 +328,7 @@ def init_matrix_semirelaxed(C1, C2, p, loss_fun='square_loss', nx=None):
     r"""Return loss matrices and tensors for semi-relaxed Gromov-Wasserstein fast computation
 
     Returns the value of :math:`\mathcal{L}(\mathbf{C_1}, \mathbf{C_2}) \otimes \mathbf{T}` with the
-    selected loss function as the loss function of semi-relaxed Gromow-Wasserstein discrepancy.
+    selected loss function as the loss function of semi-relaxed Gromov-Wasserstein discrepancy.
 
     The matrices are computed as described in Proposition 1 in :ref:`[12] <references-init-matrix>`
     and adapted to the semi-relaxed problem where the second marginal is not a constant anymore.
diff --git a/ot/lp/__init__.py b/ot/lp/__init__.py
index 2ff02ab..4952a21 100644
--- a/ot/lp/__init__.py
+++ b/ot/lp/__init__.py
@@ -253,7 +253,7 @@ def emd(a, b, M, numItermax=100000, log=False, center_dual=True, numThreads=1):
         Otherwise returns only the optimal transportation matrix.
     center_dual: boolean, optional (default=True)
         If True, centers the dual potential using function
-        :ref:`center_ot_dual`.
+        :py:func:`ot.lp.center_ot_dual`.
     numThreads: int or "max", optional (default=1, i.e. OpenMP is not used)
         If compiled with OpenMP, chooses the number of threads to parallelize.
         "max" selects the highest number possible.
@@ -418,7 +418,7 @@ def emd2(a, b, M, processes=1,
         If True, returns the optimal transportation matrix in the log.
     center_dual: boolean, optional (default=True)
         If True, centers the dual potential using function
-        :ref:`center_ot_dual`.
+        :py:func:`ot.lp.center_ot_dual`.
     numThreads: int or "max", optional (default=1, i.e. OpenMP is not used)
         If compiled with OpenMP, chooses the number of threads to parallelize.
         "max" selects the highest number possible.
@@ -631,6 +631,7 @@ def free_support_barycenter(measures_locations, measures_weights, X_init, b=None
 
 
     .. _references-free-support-barycenter:
+
     References
     ----------
     .. [20] Cuturi, Marco, and Arnaud Doucet. "Fast computation of Wasserstein barycenters." International Conference on Machine Learning. 2014.
@@ -688,7 +689,7 @@ def free_support_barycenter(measures_locations, measures_weights, X_init, b=None
 def generalized_free_support_barycenter(X_list, a_list, P_list, n_samples_bary, Y_init=None, b=None, weights=None,
                                         numItermax=100, stopThr=1e-7, verbose=False, log=None, numThreads=1, eps=0):
     r"""
-    Solves the free support generalised Wasserstein barycenter problem: finding a barycenter (a discrete measure with
+    Solves the free support generalized Wasserstein barycenter problem: finding a barycenter (a discrete measure with
     a fixed amount of points of uniform weights) whose respective projections fit the input measures.
     More formally:
 
@@ -776,7 +777,7 @@ def generalized_free_support_barycenter(X_list, a_list, P_list, n_samples_bary,
         Y_init = nx.randn(n_samples_bary, d, type_as=X_list[0])
 
     if b is None:
-        b = nx.ones(n_samples_bary, type_as=X_list[0]) / n_samples_bary  # not optimised
+        b = nx.ones(n_samples_bary, type_as=X_list[0]) / n_samples_bary  # not optimized
 
     out = free_support_barycenter(Z_list, a_list, Y_init, b, numItermax=numItermax,
                                   stopThr=stopThr, verbose=verbose, log=log, numThreads=numThreads)
@@ -786,7 +787,7 @@ def generalized_free_support_barycenter(X_list, a_list, P_list, n_samples_bary,
     else:
         Y = out
         log_dict = None
-    Y = Y @ B.T  # return to the Generalised WB formulation
+    Y = Y @ B.T  # return to the Generalized WB formulation
 
     if log:
         return Y, log_dict
diff --git a/ot/lp/cvx.py b/ot/lp/cvx.py
index 361ad0f..3f7eb36 100644
--- a/ot/lp/cvx.py
+++ b/ot/lp/cvx.py
@@ -52,7 +52,7 @@ def barycenter(A, M, weights=None, verbose=False, log=False, solver='interior-po
     reg : float
         Regularization term >0
     weights : np.ndarray (n,)
-        Weights of each histogram a_i on the simplex (barycentric coodinates)
+        Weights of each histogram a_i on the simplex (barycentric coordinates)
     verbose : bool, optional
         Print information along iterations
     log : bool, optional
diff --git a/ot/lp/solver_1d.py b/ot/lp/solver_1d.py
index 840801a..8d841ec 100644
--- a/ot/lp/solver_1d.py
+++ b/ot/lp/solver_1d.py
@@ -37,7 +37,7 @@ def quantile_function(qs, cws, xs):
     n = xs.shape[0]
     if nx.__name__ == 'torch':
         # this is to ensure the best performance for torch searchsorted
-        # and avoid a warninng related to non-contiguous arrays
+        # and avoid a warning related to non-contiguous arrays
         cws = cws.T.contiguous()
         qs = qs.T.contiguous()
     else:
@@ -145,6 +145,7 @@ def emd_1d(x_a, x_b, a=None, b=None, metric='sqeuclidean', p=1., dense=True,
         s.t. \gamma 1 = a,
              \gamma^T 1= b,
              \gamma\geq 0
+
     where :
 
     - d is the metric
@@ -283,6 +284,7 @@ def emd2_1d(x_a, x_b, a=None, b=None, metric='sqeuclidean', p=1., dense=True,
         s.t. \gamma 1 = a,
              \gamma^T 1= b,
              \gamma\geq 0
+
     where :
 
     - d is the metric
@@ -464,7 +466,7 @@ def derivative_cost_on_circle(theta, u_values, v_values, u_cdf, v_cdf, p=2):
 
     if nx.__name__ == 'torch':
         # this is to ensure the best performance for torch searchsorted
-        # and avoid a warninng related to non-contiguous arrays
+        # and avoid a warning related to non-contiguous arrays
         u_cdf = u_cdf.contiguous()
         v_cdf_theta = v_cdf_theta.contiguous()
 
@@ -478,7 +480,7 @@ def derivative_cost_on_circle(theta, u_values, v_values, u_cdf, v_cdf, p=2):
 
     if nx.__name__ == 'torch':
         # this is to ensure the best performance for torch searchsorted
-        # and avoid a warninng related to non-contiguous arrays
+        # and avoid a warning related to non-contiguous arrays
         u_cdfm = u_cdfm.contiguous()
         v_cdf_theta = v_cdf_theta.contiguous()
 
@@ -665,8 +667,8 @@ def binary_search_circle(u_values, v_values, u_weights=None, v_weights=None, p=1
 
     if u_values.shape[1] != v_values.shape[1]:
         raise ValueError(
-            "u and v must have the same number of batchs {} and {} respectively given".format(u_values.shape[1],
-                                                                                              v_values.shape[1]))
+            "u and v must have the same number of batches {} and {} respectively given".format(u_values.shape[1],
+                                                                                               v_values.shape[1]))
 
     u_values = u_values % 1
     v_values = v_values % 1
diff --git a/ot/optim.py b/ot/optim.py
index b15c77b..9e65e81 100644
--- a/ot/optim.py
+++ b/ot/optim.py
@@ -138,6 +138,7 @@ def generic_conditional_gradient(a, b, M, f, df, reg1, reg2, lp_solver, line_sea
              \gamma^T \mathbf{1} &= \mathbf{b} (optional constraint)
 
              \gamma &\geq 0
+
     where :
 
     - :math:`\mathbf{M}` is the (`ns`, `nt`) metric cost matrix
@@ -157,6 +158,7 @@ def generic_conditional_gradient(a, b, M, f, df, reg1, reg2, lp_solver, line_sea
              \gamma^T \mathbf{1} &= \mathbf{b}
 
              \gamma &\geq 0
+
     where :
 
     - :math:`\Omega` is the entropic regularization term :math:`\Omega(\gamma)=\sum_{i,j} \gamma_{i,j}\log(\gamma_{i,j})`
@@ -224,7 +226,7 @@ def generic_conditional_gradient(a, b, M, f, df, reg1, reg2, lp_solver, line_sea
 
     See Also
     --------
-    ot.lp.emd : Unregularized optimal ransport
+    ot.lp.emd : Unregularized optimal transport
     ot.bregman.sinkhorn : Entropic regularized optimal transport
     """
     a, b, M, G0 = list_to_array(a, b, M, G0)
@@ -325,6 +327,7 @@ def cg(a, b, M, reg, f, df, G0=None, line_search=line_search_armijo,
              \gamma^T \mathbf{1} &= \mathbf{b}
 
              \gamma &\geq 0
+
     where :
 
     - :math:`\mathbf{M}` is the (`ns`, `nt`) metric cost matrix
@@ -380,7 +383,7 @@ def cg(a, b, M, reg, f, df, G0=None, line_search=line_search_armijo,
 
     See Also
     --------
-    ot.lp.emd : Unregularized optimal ransport
+    ot.lp.emd : Unregularized optimal transport
     ot.bregman.sinkhorn : Entropic regularized optimal transport
 
     """
@@ -407,6 +410,7 @@ def semirelaxed_cg(a, b, M, reg, f, df, G0=None, line_search=line_search_armijo,
         s.t. \ \gamma \mathbf{1} &= \mathbf{a}
 
              \gamma &\geq 0
+
     where :
 
     - :math:`\mathbf{M}` is the (`ns`, `nt`) metric cost matrix
@@ -492,6 +496,7 @@ def gcg(a, b, M, reg1, reg2, f, df, G0=None, numItermax=10,
              \gamma^T \mathbf{1} &= \mathbf{b}
 
              \gamma &\geq 0
+
     where :
 
     - :math:`\mathbf{M}` is the (`ns`, `nt`) metric cost matrix
diff --git a/ot/partial.py b/ot/partial.py
index bf4119d..43f3362 100755
--- a/ot/partial.py
+++ b/ot/partial.py
@@ -516,7 +516,7 @@ def partial_gromov_wasserstein(C1, C2, p, q, m=None, nb_dummies=1, G0=None,
     nb_dummies : int, optional
         Number of dummy points to add (avoid instabilities in the EMD solver)
     G0 : ndarray, shape (ns, nt), optional
-        Initialisation of the transportation matrix
+        Initialization of the transportation matrix
     thres : float, optional
         quantile of the gradient matrix to populate the cost matrix when 0
         (default: 1)
@@ -686,7 +686,7 @@ def partial_gromov_wasserstein2(C1, C2, p, q, m=None, nb_dummies=1, G0=None,
     C1 : ndarray, shape (ns, ns)
         Metric cost matrix in the source space
     C2 : ndarray, shape (nt, nt)
-        Metric costfr matrix in the target space
+        Metric cost matrix in the target space
     p : ndarray, shape (ns,)
         Distribution in the source space
     q : ndarray, shape (nt,)
@@ -697,7 +697,7 @@ def partial_gromov_wasserstein2(C1, C2, p, q, m=None, nb_dummies=1, G0=None,
     nb_dummies : int, optional
         Number of dummy points to add (avoid instabilities in the EMD solver)
     G0 : ndarray, shape (ns, nt), optional
-        Initialisation of the transportation matrix
+        Initialization of the transportation matrix
     thres : float, optional
         quantile of the gradient matrix to populate the cost matrix when 0
         (default: 1)
@@ -958,15 +958,15 @@ def entropic_partial_gromov_wasserstein(C1, C2, p, q, reg, m=None, G0=None,
     - `m` is the amount of mass to be transported
 
     The formulation of the GW problem has been proposed in
-    :ref:`[12] <references-entropic-partial-gromov-wassertein>` and the
-    partial GW in :ref:`[29] <references-entropic-partial-gromov-wassertein>`
+    :ref:`[12] <references-entropic-partial-gromov-wasserstein>` and the
+    partial GW in :ref:`[29] <references-entropic-partial-gromov-wasserstein>`
 
     Parameters
     ----------
     C1 : ndarray, shape (ns, ns)
         Metric cost matrix in the source space
     C2 : ndarray, shape (nt, nt)
-        Metric costfr matrix in the target space
+        Metric cost matrix in the target space
     p : ndarray, shape (ns,)
         Distribution in the source space
     q : ndarray, shape (nt,)
@@ -977,7 +977,7 @@ def entropic_partial_gromov_wasserstein(C1, C2, p, q, reg, m=None, G0=None,
         Amount of mass to be transported (default:
         :math:`\min\{\|\mathbf{p}\|_1, \|\mathbf{q}\|_1\}`)
     G0 : ndarray, shape (ns, nt), optional
-        Initialisation of the transportation matrix
+        Initialization of the transportation matrix
     numItermax : int, optional
         Max number of iterations
     tol : float, optional
@@ -1016,7 +1016,7 @@ def entropic_partial_gromov_wasserstein(C1, C2, p, q, reg, m=None, G0=None,
         log dictionary returned only if `log` is `True`
 
 
-    .. _references-entropic-partial-gromov-wassertein:
+    .. _references-entropic-partial-gromov-wasserstein:
     References
     ----------
     .. [12] Peyré, Gabriel, Marco Cuturi, and Justin Solomon,
@@ -1107,8 +1107,8 @@ def entropic_partial_gromov_wasserstein2(C1, C2, p, q, reg, m=None, G0=None,
     - `m` is the amount of mass to be transported
 
     The formulation of the GW problem has been proposed in
-    :ref:`[12] <references-entropic-partial-gromov-wassertein2>` and the
-    partial GW in :ref:`[29] <references-entropic-partial-gromov-wassertein2>`
+    :ref:`[12] <references-entropic-partial-gromov-wasserstein2>` and the
+    partial GW in :ref:`[29] <references-entropic-partial-gromov-wasserstein2>`
 
 
     Parameters
@@ -1116,7 +1116,7 @@ def entropic_partial_gromov_wasserstein2(C1, C2, p, q, reg, m=None, G0=None,
     C1 : ndarray, shape (ns, ns)
         Metric cost matrix in the source space
     C2 : ndarray, shape (nt, nt)
-        Metric costfr matrix in the target space
+        Metric cost matrix in the target space
     p : ndarray, shape (ns,)
         Distribution in the source space
     q : ndarray, shape (nt,)
@@ -1127,7 +1127,7 @@ def entropic_partial_gromov_wasserstein2(C1, C2, p, q, reg, m=None, G0=None,
         Amount of mass to be transported (default:
         :math:`\min\{\|\mathbf{p}\|_1, \|\mathbf{q}\|_1\}`)
     G0 : ndarray, shape (ns, nt), optional
-        Initialisation of the transportation matrix
+        Initialization of the transportation matrix
     numItermax : int, optional
         Max number of iterations
     tol : float, optional
@@ -1159,7 +1159,7 @@ def entropic_partial_gromov_wasserstein2(C1, C2, p, q, reg, m=None, G0=None,
     1.87
 
 
-    .. _references-entropic-partial-gromov-wassertein2:
+    .. _references-entropic-partial-gromov-wasserstein2:
     References
     ----------
     .. [12] Peyré, Gabriel, Marco Cuturi, and Justin Solomon,
diff --git a/ot/plot.py b/ot/plot.py
index 8ade2eb..4b1bfb1 100644
--- a/ot/plot.py
+++ b/ot/plot.py
@@ -3,7 +3,7 @@ Functions for plotting OT matrices
 
 .. warning::
     Note that by default the module is not import in :mod:`ot`. In order to
-    use it you need to explicitely import :mod:`ot.plot`
+    use it you need to explicitly import :mod:`ot.plot`
 
 
 """
diff --git a/ot/regpath.py b/ot/regpath.py
index e745288..8a9b6d8 100644
--- a/ot/regpath.py
+++ b/ot/regpath.py
@@ -399,7 +399,7 @@ def compute_next_removal(phi, delta, current_gamma):
 
 def complement_schur(M_current, b, d, id_pop):
     r""" This function computes the inverse of the design matrix in the \
-    regularization path using the  Schur complement. Two cases may arise:
+    regularization path using the Schur complement. Two cases may arise:
 
     Case 1: one variable is added to the active set
 
diff --git a/ot/sliced.py b/ot/sliced.py
index fa2141e..3a1644d 100644
--- a/ot/sliced.py
+++ b/ot/sliced.py
@@ -173,7 +173,7 @@ def max_sliced_wasserstein_distance(X_s, X_t, a=None, b=None, n_projections=50,
 
     where :
 
-    - :math:`\theta_\# \mu` stands for the pushforwars of the projection :math:`\mathbb{R}^d \ni X \mapsto \langle \theta, X \rangle`
+    - :math:`\theta_\# \mu` stands for the pushforwards of the projection :math:`\mathbb{R}^d \ni X \mapsto \langle \theta, X \rangle`
 
 
     Parameters
diff --git a/ot/unbalanced.py b/ot/unbalanced.py
index a71a0dd..9584d77 100644
--- a/ot/unbalanced.py
+++ b/ot/unbalanced.py
@@ -121,7 +121,7 @@ def sinkhorn_unbalanced(a, b, M, reg, reg_m, method='sinkhorn', numItermax=1000,
     ot.unbalanced.sinkhorn_stabilized_unbalanced:
         Unbalanced Stabilized sinkhorn :ref:`[9, 10] <references-sinkhorn-unbalanced>`
     ot.unbalanced.sinkhorn_reg_scaling_unbalanced:
-        Unbalanced Sinkhorn with epslilon scaling :ref:`[9, 10] <references-sinkhorn-unbalanced>`
+        Unbalanced Sinkhorn with epsilon scaling :ref:`[9, 10] <references-sinkhorn-unbalanced>`
 
     """
 
@@ -163,6 +163,7 @@ def sinkhorn_unbalanced2(a, b, M, reg, reg_m, method='sinkhorn',
 
         s.t.
              \gamma\geq 0
+
     where :
 
     - :math:`\mathbf{M}` is the (`dim_a`, `dim_b`) metric cost matrix
@@ -240,7 +241,7 @@ def sinkhorn_unbalanced2(a, b, M, reg, reg_m, method='sinkhorn',
     --------
     ot.unbalanced.sinkhorn_knopp : Unbalanced Classic Sinkhorn :ref:`[10] <references-sinkhorn-unbalanced2>`
     ot.unbalanced.sinkhorn_stabilized: Unbalanced Stabilized sinkhorn :ref:`[9, 10] <references-sinkhorn-unbalanced2>`
-    ot.unbalanced.sinkhorn_reg_scaling: Unbalanced Sinkhorn with epslilon scaling :ref:`[9, 10] <references-sinkhorn-unbalanced2>`
+    ot.unbalanced.sinkhorn_reg_scaling: Unbalanced Sinkhorn with epsilon scaling :ref:`[9, 10] <references-sinkhorn-unbalanced2>`
 
     """
     b = list_to_array(b)
@@ -492,7 +493,7 @@ def sinkhorn_stabilized_unbalanced(a, b, M, reg, reg_m, tau=1e5, numItermax=1000
     reg_m: float
         Marginal relaxation term > 0
     tau : float
-        thershold for max value in u or v for log scaling
+        threshold for max value in u or v for log scaling
     numItermax : int, optional
         Max number of iterations
     stopThr : float, optional
@@ -699,7 +700,7 @@ def barycenter_unbalanced_stabilized(A, M, reg, reg_m, weights=None, tau=1e3,
     tau : float
         Stabilization threshold for log domain absorption.
     weights : array-like (n_hists,) optional
-        Weight of each distribution (barycentric coodinates)
+        Weight of each distribution (barycentric coordinates)
         If None, uniform weights are used.
     numItermax : int, optional
         Max number of iterations
diff --git a/test/test_ot.py b/test/test_ot.py
index f2338ac..068080b 100644
--- a/test/test_ot.py
+++ b/test/test_ot.py
@@ -299,7 +299,7 @@ def test_lp_barycenter():
     A = np.hstack((a1, a2))
     M = np.array([[0, 1.0, 4.0], [1.0, 0, 1.0], [4.0, 1.0, 0]])
 
-    # obvious barycenter between two diracs
+    # obvious barycenter between two Diracs
     bary0 = np.array([0, 1.0, 0])
 
     bary = ot.lp.barycenter(A, M, [.5, .5])
@@ -314,7 +314,7 @@ def test_free_support_barycenter():
 
     X_init = np.array([-12.]).reshape((1, 1))
 
-    # obvious barycenter location between two diracs
+    # obvious barycenter location between two Diracs
     bar_locations = np.array([0.]).reshape((1, 1))
 
     X = ot.lp.free_support_barycenter(measures_locations, measures_weights, X_init)
@@ -348,7 +348,7 @@ def test_generalised_free_support_barycenter():
 
     Y_init = np.array([-12., 7.]).reshape((1, 2))
 
-    # obvious barycenter location between two 2D diracs
+    # obvious barycenter location between two 2D Diracs
     Y_true = np.array([0., .0]).reshape((1, 2))
 
     # test without log and no init
@@ -387,7 +387,7 @@ def test_lp_barycenter_cvxopt():
     A = np.hstack((a1, a2))
     M = np.array([[0, 1.0, 4.0], [1.0, 0, 1.0], [4.0, 1.0, 0]])
 
-    # obvious barycenter between two diracs
+    # obvious barycenter between two Diracs
     bary0 = np.array([0, 1.0, 0])
 
     bary = ot.lp.barycenter(A, M, [.5, .5], solver=None)
author	Oleksii Kachaiev <kachayev@gmail.com>	2023-05-03 10:36:09 +0200
committer	GitHub <noreply@github.com>	2023-05-03 10:36:09 +0200
commit	2aeb591be6b19a93f187516495ed15f1a47be925 (patch)
tree	9a6f759856a3f6b2d7c6db3514927ba3e5af10b5
parent	8a7035bdaa5bb164d1c16febbd83650d1fb6d393 (diff)