summaryrefslogtreecommitdiff
path: root/src/python/doc
diff options
context:
space:
mode:
authorROUVREAU Vincent <vincent.rouvreau@inria.fr>2020-06-22 09:57:47 +0200
committerROUVREAU Vincent <vincent.rouvreau@inria.fr>2020-06-22 09:57:47 +0200
commit811c8c225c965577689a0dacf87f643254db5516 (patch)
treefb7e97e950f4f9d81a0ec46c935254b4f74a0c43 /src/python/doc
parent608248f0d64e96d29bec3e9f8f3d1ba84fe83e49 (diff)
parentcec4a5d7df6d5ed43511e94f9db580489979105a (diff)
Merge branch 'master' into alpha_complex_3d_python
Diffstat (limited to 'src/python/doc')
-rw-r--r--src/python/doc/clustering.inc12
-rw-r--r--src/python/doc/clustering.rst73
-rw-r--r--src/python/doc/img/spiral-color.pngbin0 -> 222425 bytes
-rw-r--r--src/python/doc/index.rst5
4 files changed, 90 insertions, 0 deletions
diff --git a/src/python/doc/clustering.inc b/src/python/doc/clustering.inc
new file mode 100644
index 00000000..2d07ae88
--- /dev/null
+++ b/src/python/doc/clustering.inc
@@ -0,0 +1,12 @@
+.. table::
+ :widths: 30 40 30
+
+ +--------------------------+-------------------------------------------------------+---------------------------------+
+ | .. figure:: | Clustering tools. | :Author: Marc Glisse |
+ | img/spiral-color.png | | |
+ | | | :Since: GUDHI 3.3.0 |
+ | | | |
+ | | | :License: MIT |
+ +--------------------------+-------------------------------------------------------+---------------------------------+
+ | * :doc:`clustering` |
+ +--------------------------+-----------------------------------------------------------------------------------------+
diff --git a/src/python/doc/clustering.rst b/src/python/doc/clustering.rst
new file mode 100644
index 00000000..c5a57d3c
--- /dev/null
+++ b/src/python/doc/clustering.rst
@@ -0,0 +1,73 @@
+:orphan:
+
+.. To get rid of WARNING: document isn't included in any toctree
+
+=================
+Clustering manual
+=================
+
+We provide an implementation of ToMATo :cite:`tomato`, a persistence-based clustering algorithm. In short, this algorithm uses a density estimator and a neighborhood graph, starts with a mode-seeking phase (naive hill-climbing) to build initial clusters, and finishes by merging clusters based on their prominence.
+
+The merging phase depends on a parameter, which is the minimum prominence a cluster needs to avoid getting merged into another, bigger cluster. This parameter determines the number of clusters, and for convenience we allow you to choose instead the number of clusters. Decreasing the prominence threshold defines a hierarchy of clusters: if 2 points are in separate clusters when we have k clusters, they are still in different clusters for k+1 clusters.
+
+As a by-product, we produce the persistence diagram of the merge tree of the initial clusters. This is a convenient graphical tool to help decide how many clusters we want.
+
+.. plot::
+ :context:
+ :include-source:
+
+ import gudhi
+ f = open(gudhi.__root_source_dir__ + '/data/points/spiral_2d.csv', 'r')
+ import numpy as np
+ data = np.loadtxt(f)
+ import matplotlib.pyplot as plt
+ plt.scatter(data[:,0],data[:,1],marker='.',s=1)
+ plt.show()
+
+.. plot::
+ :context: close-figs
+ :include-source:
+
+ from gudhi.clustering.tomato import Tomato
+ t = Tomato()
+ t.fit(data)
+ t.plot_diagram()
+
+As one can see in `t.n_clusters_`, the algorithm found 6316 initial clusters. The diagram shows their prominence as their distance to the diagonal. There is always one point infinitely far: there is at least one cluster. Among the others, one point seems significantly farther from the diagonal than the others, which indicates that splitting the points into 2 clusters may be a sensible idea.
+
+.. plot::
+ :context: close-figs
+ :include-source:
+
+ t.n_clusters_=2
+ plt.scatter(data[:,0],data[:,1],marker='.',s=1,c=t.labels_)
+ plt.show()
+
+Of course this is just the result for one set of parameters. We can ask for a different density estimator and a different neighborhood graph computed with different parameters.
+
+.. plot::
+ :context: close-figs
+ :include-source:
+
+ t = Tomato(density_type='DTM', k=100)
+ t.fit(data)
+ t.plot_diagram()
+
+Makes the number of clusters clearer, and changes a bit the shape of the clusters.
+
+A quick look at the corresponding density estimate
+
+.. plot::
+ :context: close-figs
+ :include-source:
+
+ plt.scatter(data[:,0],data[:,1],marker='.',s=1,c=t.weights_)
+ plt.show()
+
+The code provides a few density estimators and graph constructions for convenience when first experimenting, but it is actually expected that advanced users provide their own graph and density estimates instead of point coordinates.
+
+Since the algorithm essentially computes basins of attraction, it is also encouraged to use it on functions that do not represent densities at all.
+
+.. autoclass:: gudhi.clustering.tomato.Tomato
+ :members:
+ :special-members: __init__
diff --git a/src/python/doc/img/spiral-color.png b/src/python/doc/img/spiral-color.png
new file mode 100644
index 00000000..21b62dfc
--- /dev/null
+++ b/src/python/doc/img/spiral-color.png
Binary files differ
diff --git a/src/python/doc/index.rst b/src/python/doc/index.rst
index 05bc18b4..040e57a4 100644
--- a/src/python/doc/index.rst
+++ b/src/python/doc/index.rst
@@ -86,3 +86,8 @@ Point cloud utilities
*********************
.. include:: point_cloud_sum.inc
+
+Clustering
+**********
+
+.. include:: clustering.inc