summaryrefslogtreecommitdiff
path: root/src/python/doc
diff options
context:
space:
mode:
Diffstat (limited to 'src/python/doc')
-rw-r--r--src/python/doc/alpha_complex_user.rst51
-rw-r--r--src/python/doc/clustering.inc12
-rw-r--r--src/python/doc/clustering.rst73
-rw-r--r--src/python/doc/img/spiral-color.pngbin0 -> 222425 bytes
-rw-r--r--src/python/doc/index.rst5
5 files changed, 102 insertions, 39 deletions
diff --git a/src/python/doc/alpha_complex_user.rst b/src/python/doc/alpha_complex_user.rst
index 097470c1..fffcb3db 100644
--- a/src/python/doc/alpha_complex_user.rst
+++ b/src/python/doc/alpha_complex_user.rst
@@ -34,8 +34,8 @@ Remarks
the computation of filtration values can exceptionally be arbitrarily bad. In all cases, we still guarantee that the
output is a valid filtration (faces have a filtration value no larger than their cofaces).
* For performances reasons, it is advised to use Alpha_complex with `CGAL <installation.html#cgal>`_ :math:`\geq` 5.0.0.
-
-For performances reasons, it is advised to use CGAL :math:`\geq` 5.0.0.
+* The vertices in the output simplex tree are not guaranteed to match the order of the input points. One can use
+ :func:`~gudhi.AlphaComplex.get_point` to get the initial point back.
Example from points
-------------------
@@ -178,49 +178,22 @@ In the following example, a threshold of :math:`\alpha^2 = 32.0` is used.
Example from OFF file
^^^^^^^^^^^^^^^^^^^^^
-This example builds the Delaunay triangulation from the points given by an OFF file, and initializes the alpha complex
-with it.
-
+This example builds the alpha complex from 300 random points on a 2-torus.
-Then, it is asked to display information about the alpha complex:
+Then, it computes the persistence diagram and displays it:
-.. testcode::
+.. plot::
+ :include-source:
+ import matplotlib.pyplot as plt
import gudhi
alpha_complex = gudhi.AlphaComplex(off_file=gudhi.__root_source_dir__ + \
- '/data/points/alphacomplexdoc.off')
- simplex_tree = alpha_complex.create_simplex_tree(max_alpha_square=32.0)
+ '/data/points/tore3D_300.off')
+ simplex_tree = alpha_complex.create_simplex_tree()
result_str = 'Alpha complex is of dimension ' + repr(simplex_tree.dimension()) + ' - ' + \
repr(simplex_tree.num_simplices()) + ' simplices - ' + \
repr(simplex_tree.num_vertices()) + ' vertices.'
print(result_str)
- fmt = '%s -> %.2f'
- for filtered_value in simplex_tree.get_filtration():
- print(fmt % tuple(filtered_value))
-
-the program output is:
-
-.. testoutput::
-
- Alpha complex is of dimension 2 - 20 simplices - 7 vertices.
- [0] -> 0.00
- [1] -> 0.00
- [2] -> 0.00
- [3] -> 0.00
- [4] -> 0.00
- [5] -> 0.00
- [6] -> 0.00
- [2, 3] -> 6.25
- [4, 5] -> 7.25
- [0, 2] -> 8.50
- [0, 1] -> 9.25
- [1, 3] -> 10.00
- [1, 2] -> 11.25
- [1, 2, 3] -> 12.50
- [0, 1, 2] -> 13.00
- [5, 6] -> 13.25
- [2, 4] -> 20.00
- [4, 6] -> 22.74
- [4, 5, 6] -> 22.74
- [3, 6] -> 30.25
-
+ diag = simplex_tree.persistence()
+ gudhi.plot_persistence_diagram(diag)
+ plt.show()
diff --git a/src/python/doc/clustering.inc b/src/python/doc/clustering.inc
new file mode 100644
index 00000000..2d07ae88
--- /dev/null
+++ b/src/python/doc/clustering.inc
@@ -0,0 +1,12 @@
+.. table::
+ :widths: 30 40 30
+
+ +--------------------------+-------------------------------------------------------+---------------------------------+
+ | .. figure:: | Clustering tools. | :Author: Marc Glisse |
+ | img/spiral-color.png | | |
+ | | | :Since: GUDHI 3.3.0 |
+ | | | |
+ | | | :License: MIT |
+ +--------------------------+-------------------------------------------------------+---------------------------------+
+ | * :doc:`clustering` |
+ +--------------------------+-----------------------------------------------------------------------------------------+
diff --git a/src/python/doc/clustering.rst b/src/python/doc/clustering.rst
new file mode 100644
index 00000000..c5a57d3c
--- /dev/null
+++ b/src/python/doc/clustering.rst
@@ -0,0 +1,73 @@
+:orphan:
+
+.. To get rid of WARNING: document isn't included in any toctree
+
+=================
+Clustering manual
+=================
+
+We provide an implementation of ToMATo :cite:`tomato`, a persistence-based clustering algorithm. In short, this algorithm uses a density estimator and a neighborhood graph, starts with a mode-seeking phase (naive hill-climbing) to build initial clusters, and finishes by merging clusters based on their prominence.
+
+The merging phase depends on a parameter, which is the minimum prominence a cluster needs to avoid getting merged into another, bigger cluster. This parameter determines the number of clusters, and for convenience we allow you to choose instead the number of clusters. Decreasing the prominence threshold defines a hierarchy of clusters: if 2 points are in separate clusters when we have k clusters, they are still in different clusters for k+1 clusters.
+
+As a by-product, we produce the persistence diagram of the merge tree of the initial clusters. This is a convenient graphical tool to help decide how many clusters we want.
+
+.. plot::
+ :context:
+ :include-source:
+
+ import gudhi
+ f = open(gudhi.__root_source_dir__ + '/data/points/spiral_2d.csv', 'r')
+ import numpy as np
+ data = np.loadtxt(f)
+ import matplotlib.pyplot as plt
+ plt.scatter(data[:,0],data[:,1],marker='.',s=1)
+ plt.show()
+
+.. plot::
+ :context: close-figs
+ :include-source:
+
+ from gudhi.clustering.tomato import Tomato
+ t = Tomato()
+ t.fit(data)
+ t.plot_diagram()
+
+As one can see in `t.n_clusters_`, the algorithm found 6316 initial clusters. The diagram shows their prominence as their distance to the diagonal. There is always one point infinitely far: there is at least one cluster. Among the others, one point seems significantly farther from the diagonal than the others, which indicates that splitting the points into 2 clusters may be a sensible idea.
+
+.. plot::
+ :context: close-figs
+ :include-source:
+
+ t.n_clusters_=2
+ plt.scatter(data[:,0],data[:,1],marker='.',s=1,c=t.labels_)
+ plt.show()
+
+Of course this is just the result for one set of parameters. We can ask for a different density estimator and a different neighborhood graph computed with different parameters.
+
+.. plot::
+ :context: close-figs
+ :include-source:
+
+ t = Tomato(density_type='DTM', k=100)
+ t.fit(data)
+ t.plot_diagram()
+
+Makes the number of clusters clearer, and changes a bit the shape of the clusters.
+
+A quick look at the corresponding density estimate
+
+.. plot::
+ :context: close-figs
+ :include-source:
+
+ plt.scatter(data[:,0],data[:,1],marker='.',s=1,c=t.weights_)
+ plt.show()
+
+The code provides a few density estimators and graph constructions for convenience when first experimenting, but it is actually expected that advanced users provide their own graph and density estimates instead of point coordinates.
+
+Since the algorithm essentially computes basins of attraction, it is also encouraged to use it on functions that do not represent densities at all.
+
+.. autoclass:: gudhi.clustering.tomato.Tomato
+ :members:
+ :special-members: __init__
diff --git a/src/python/doc/img/spiral-color.png b/src/python/doc/img/spiral-color.png
new file mode 100644
index 00000000..21b62dfc
--- /dev/null
+++ b/src/python/doc/img/spiral-color.png
Binary files differ
diff --git a/src/python/doc/index.rst b/src/python/doc/index.rst
index 05bc18b4..040e57a4 100644
--- a/src/python/doc/index.rst
+++ b/src/python/doc/index.rst
@@ -86,3 +86,8 @@ Point cloud utilities
*********************
.. include:: point_cloud_sum.inc
+
+Clustering
+**********
+
+.. include:: clustering.inc