:orphan: .. To get rid of WARNING: document isn't included in any toctree Cubical complex user manual =========================== Definition ---------- .. list-table:: :widths: 25 50 25 :header-rows: 0 * - :Author: Pawel Dlotko - :Since: GUDHI 2.0.0 - :License: MIT * - :doc:`cubical_complex_user` - * :doc:`cubical_complex_ref` * :doc:`periodic_cubical_complex_ref` * :doc:`cubical_complex_sklearn_itf_ref` - The cubical complex is an example of a structured complex useful in computational mathematics (specially rigorous numerics) and image analysis. An *elementary interval* is an interval of a form :math:`[n,n+1]`, or :math:`[n,n]`, for :math:`n \in \mathcal{Z}`. The first one is called *non-degenerate*, while the second one is a *degenerate* interval. A *boundary of a elementary interval* is a chain :math:`\partial [n,n+1] = [n+1,n+1]-[n,n]` in case of non-degenerated elementary interval and :math:`\partial [n,n] = 0` in case of degenerate elementary interval. An *elementary cube* :math:`C` is a product of elementary intervals, :math:`C=I_1 \times \ldots \times I_n`. *Embedding dimension* of a cube is n, the number of elementary intervals (degenerate or not) in the product. A *dimension of a cube* :math:`C=I_1 \times ... \times I_n` is the number of non degenerate elementary intervals in the product. A *boundary of a cube* :math:`C=I_1 \times \ldots \times I_n` is a chain obtained in the following way: .. math:: \partial C = (\partial I_1 \times \ldots \times I_n) + (I_1 \times \partial I_2 \times \ldots \times I_n) + \ldots + (I_1 \times I_2 \times \ldots \times \partial I_n). A *cubical complex* :math:`\mathcal{K}` is a collection of cubes closed under operation of taking boundary (i.e. boundary of every cube from the collection is in the collection). A cube :math:`C` in cubical complex :math:`\mathcal{K}` is *maximal* if it is not in a boundary of any other cube in :math:`\mathcal{K}`. A *support* of a cube :math:`C` is the set in :math:`\mathbb{R}^n` occupied by :math:`C` (:math:`n` is the embedding dimension of :math:`C`). Cubes may be equipped with a filtration values in which case we have filtered cubical complex. All the cubical complexes considered in this implementation are filtered cubical complexes (although, the range of a filtration may be a set of two elements). For further details and theory of cubical complexes, please consult :cite:`kaczynski2004computational` as well as the following paper :cite:`peikert2012topological`. Data structure -------------- The implementation of Cubical complex provides a representation of complexes that occupy a rectangular region in :math:`\mathbb{R}^n`. This extra assumption allows for a memory efficient way of storing cubical complexes in a form of so called bitmaps. Let :math:`R = [b_1,e_1] \times \ldots \times [b_n,e_n]`, for :math:`b_1,...b_n,e_1,...,e_n \in \mathbb{Z}`, :math:`b_i \leq d_i` be the considered rectangular region and let :math:`\mathcal{K}` be a filtered cubical complex having the rectangle :math:`R` as its support. Note that the structure of the coordinate system gives a way a lexicographical ordering of cells of :math:`\mathcal{K}`. This ordering is a base of the presented bitmap-based implementation. In this implementation, the whole cubical complex is stored as a vector of the values of filtration. This, together with dimension of :math:`\mathcal{K}` and the sizes of :math:`\mathcal{K}` in all directions, allows to determine, dimension, neighborhood, boundary and coboundary of every cube :math:`C \in \mathcal{K}`. .. figure:: ../../doc/Bitmap_cubical_complex/Cubical_complex_representation.png :alt: Cubical complex. :figclass: align-center Cubical complex. Note that the cubical complex in the figure above is, in a natural way, a product of one dimensional cubical complexes in :math:`\mathbb{R}`. The number of all cubes in each direction is equal :math:`2n+1`, where :math:`n` is the number of maximal cubes in the considered direction. Let us consider a cube at the position :math:`k` in the bitmap. Knowing the sizes of the bitmap, by a series of modulo operation, we can determine which elementary intervals are present in the product that gives the cube :math:`C`. In a similar way, we can compute boundary and the coboundary of each cube. Further details can be found in the literature. Input Format ------------ In the current implantation, filtration is given at the maximal cubes, and it is then extended by the lower star filtration to all cubes. There are a number of constructors that can be used to construct cubical complex by users who want to use the code directly. They can be found in the :doc:`cubical_complex_ref`. Currently one input from a text file is used. It uses a format inspired from the Perseus software `Perseus software `_ by Vidit Nanda. .. note:: While Perseus assume the filtration of all maximal cubes to be non-negative, over here we do not enforce this and we allow any filtration values. As a consequence one cannot use ``-1``'s to indicate missing cubes. If you have missing cubes in your complex, please set their filtration to :math:`+\infty` (aka. ``inf`` in the file). The file format is described in details in `Perseus file format `_ section. .. testcode:: import gudhi cubical_complex = gudhi.CubicalComplex(perseus_file=gudhi.__root_source_dir__ + \ '/data/bitmap/cubicalcomplexdoc.txt') result_str = 'Cubical complex is of dimension ' + repr(cubical_complex.dimension()) + ' - ' + \ repr(cubical_complex.num_simplices()) + ' simplices.' print(result_str) the program output is: .. testoutput:: Cubical complex is of dimension 2 - 49 simplices. Periodic boundary conditions ---------------------------- Often one would like to impose periodic boundary conditions to the cubical complex (cf. :doc:`periodic_cubical_complex_ref`). Let :math:`I_1\times ... \times I_n` be a box that is decomposed with a cubical complex :math:`\mathcal{K}`. Imposing periodic boundary conditions in the direction i, means that the left and the right side of a complex :math:`\mathcal{K}` are considered the same. In particular, if for a bitmap :math:`\mathcal{K}` periodic boundary conditions are imposed in all directions, then complex :math:`\mathcal{K}` became n-dimensional torus. One can use various constructors from the file Bitmap_cubical_complex_periodic_boundary_conditions_base.h to construct cubical complex with periodic boundary conditions. One can also use Perseus style input files (see `Perseus file format `_) for the specific periodic case: .. testcode:: import gudhi periodic_cc = gudhi.PeriodicCubicalComplex(perseus_file=gudhi.__root_source_dir__ + \ '/data/bitmap/periodiccubicalcomplexdoc.txt') result_str = 'Periodic cubical complex is of dimension ' + repr(periodic_cc.dimension()) + ' - ' + \ repr(periodic_cc.num_simplices()) + ' simplices.' print(result_str) the program output is: .. testoutput:: Periodic cubical complex is of dimension 2 - 42 simplices. Or it can be defined as follows: .. testcode:: from gudhi import PeriodicCubicalComplex as pcc periodic_cc = pcc(top_dimensional_cells = [[0, 0, 0], [0, 1, 0], [0, 0, 0]], periodic_dimensions=[True, False]) result_str = 'Periodic cubical complex is of dimension ' + repr(periodic_cc.dimension()) + ' - ' + \ repr(periodic_cc.num_simplices()) + ' simplices.' print(result_str) the program output is: .. testoutput:: Periodic cubical complex is of dimension 2 - 42 simplices. Examples -------- End user programs are available in python/example/ folder. Tutorial -------- This `notebook `_ explains how to represent sublevels sets of functions using cubical complexes. Scikit-learn like interface example ----------------------------------- In this example, hand written digits are used as an input. a TDA scikit-learn pipeline is constructed and is composed of: #. :class:`~gudhi.sklearn.cubical_persistence.CubicalPersistence` that builds a cubical complex from the inputs and returns its persistence diagrams #. :class:`~gudhi.representations.DiagramSelector` that removes non-finite persistence diagrams values #. :class:`~gudhi.representations.PersistenceImage` that builds the persistence images from persistence diagrams #. `SVC `_ which is a scikit-learn support vector classifier. This ML pipeline is trained to detect if the hand written digit is an '8' or not, thanks to the fact that an '8' has two holes in :math:`\mathbf{H}_1`, or, like in this example, three connected components in :math:`\mathbf{H}_0`. .. code-block:: python # Standard scientific Python imports import numpy as np # Standard scikit-learn imports from sklearn.datasets import fetch_openml from sklearn.pipeline import Pipeline from sklearn.model_selection import train_test_split from sklearn.svm import SVC from sklearn import metrics # Import TDA pipeline requirements from gudhi.sklearn.cubical_persistence import CubicalPersistence from gudhi.representations import PersistenceImage, DiagramSelector X, y = fetch_openml("mnist_784", version=1, return_X_y=True, as_frame=False) # Target is: "is an eight ?" y = (y == "8") * 1 print("There are", np.sum(y), "eights out of", len(y), "numbers.") X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0) pipe = Pipeline( [ ("cub_pers", CubicalPersistence(only_this_dim=0, dimensions=[28, 28], n_jobs=-2)), ("finite_diags", DiagramSelector(use=True, point_type="finite")), ( "pers_img", PersistenceImage(bandwidth=50, weight=lambda x: x[1] ** 2, im_range=[0, 256, 0, 256], resolution=[20, 20]), ), ("svc", SVC()), ] ) # Learn from the train subset pipe.fit(X_train, y_train) # Predict from the test subset predicted = pipe.predict(X_test) print(f"Classification report for TDA pipeline {pipe}:\n" f"{metrics.classification_report(y_test, predicted)}\n") .. code-block:: none There are 6825 eights out of 70000 numbers. Classification report for TDA pipeline Pipeline(steps=[('cub_pers', CubicalPersistence(dimensions=[28, 28], n_jobs=-2)), ('finite_diags', DiagramSelector(use=True)), ('pers_img', PersistenceImage(bandwidth=50, im_range=[0, 256, 0, 256], weight= at 0x7f3e54137ae8>)), ('svc', SVC())]): precision recall f1-score support 0 0.97 0.99 0.98 25284 1 0.92 0.68 0.78 2716 accuracy 0.96 28000 macro avg 0.94 0.84 0.88 28000 weighted avg 0.96 0.96 0.96 28000