python/README.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130

Persistent Homology Algorithm Toolkit (PHAT)
============================================

This is a Python interface for the `Persistent Homology Algorithm Toolkit`_, a software library
that contains methods for computing the persistence pairs of a 
filtered cell complex represented by an ordered boundary matrix with Z\ :sub:`2` coefficients.

For an introduction to persistent homology, see the textbook [1]_. This software package
contains code for several algorithmic variants:

* The "standard" algorithm (see [1]_, p.153)
* The "row" algorithm from [2]_ (called pHrow in that paper)
* The "twist" algorithm, as described in [3]_ (default algorithm)
* The "chunk" algorithm presented in [4]_ 
* The "spectral sequence" algorithm (see [1]_, p.166)

All but the standard algorithm exploit the special structure of the boundary matrix
to take shortcuts in the computation. The chunk and the spectral sequence algorithms
make use of multiple CPU cores if PHAT is compiled with OpenMP support.

All algorithms are implemented as function objects that manipulate a given 
``boundary_matrix`` (to be defined below) object to reduced form. 
From this reduced form one can then easily extract the persistence pairs. 
Alternatively, one can use the ``compute_persistence_pairs function`` which takes an 
algorithm as a parameter, reduces the given ``boundary_matrix`` and stores the 
resulting pairs in a given ``persistence_pairs`` object.

The ``boundary_matrix`` class takes a "Representation" class as a parameter. 
This representation defines how columns of the matrix are represented and how 
low-level operations (e.g., column additions) are performed. The right choice of the 
representation class can be as important for the performance of the program as choosing
the algorithm. We provide the following choices of representation classes:

* ``vector_vector``: Each column is represented as a sorted ``std::vector`` of integers, containing the indices of the non-zero entries of the column. The matrix itself is a ``std::vector`` of such columns.
* ``vector_heap``: Each column is represented as a heapified ``std::vector`` of integers, containing the indices of the non-zero entries of the column. The matrix itself is a ``std::vector`` of such columns.
* ``vector_set``: Each column is a ``std::set`` of integers, with the same meaning as above. The matrix is stored as a ``std::vector`` of such columns.
* ``vector_list``: Each column is a sorted ``std::list`` of integers, with the same meaning as above. The matrix is stored as a ``std::vector`` of such columns.
* ``sparse_pivot_column``: The matrix is stored as in the vector_vector representation. However, when a column is manipulated, it is first  converted into a ``std::set``, using an extra data field called the "pivot column".  When another column is manipulated later, the pivot column is converted back to  the ``std::vector`` representation. This can lead to significant speed improvements when many columns  are added to a given pivot column consecutively. In a multicore setup, there is one pivot column per thread.
* ``heap_pivot_column``: The same idea as in the sparse version. Instead of a ``std::set``, the pivot column is represented by a ``std::priority_queue``. 
* ``full_pivot_column``: The same idea as in the sparse version. However, instead of a ``std::set``, the pivot column is expanded into a bit vector of size n (the dimension of the matrix). To avoid costly initializations, the class remembers which entries have been manipulated for a pivot column and updates only those entries when another column becomes the pivot.
* ``bit_tree_pivot_column`` (default representation): Similar to the ``full_pivot_column`` but the implementation is more efficient. Internally it is a bit-set with fast iteration over nonzero elements, and fast access to the maximal element. 

Installation
------------

Suppose you have checked out the PHAT repository at location $PHAT. Then you can:

	cd $PHAT/python

	ln -s ../include include  # (or copy, we just need $PHAT/include to be in the current folder as well)

	pip install pybind11

	python setup.py install

This will install PHAT for whatever Python installation your ``python`` executable is associated with.
Please ensure you use the ``pip`` that comes from the same directory where your ``python`` executable lives!

Currently, the PHAT Python bindings are known to work on:

* Linux with Python 2.7 (tested on Ubuntu 14.04 with system Python)
* Linux with Python 3.5 (tested on Ubuntu 14.04 with Anaconda)
* Mac OS X with Python 2.7 (tested on El Capitan with Anaconda)
* Mac OS X with Python 3.5 (tested on El Capitan with Anaconda)

Other configurations are untested.

Please let us know if there is a platform you'd like us to support, we will do so if we can.

Sample usage
------------

We will build an ordered boundary matrix of this simplicial complex consisting of a single triangle::

     3
     |\\
     | \\
     |  \\
     |   \\ 4
    5|    \\
     |     \\
     |  6   \\
     |       \\
     |________\\
     0    2    1

Now the code::

    import phat

    # define a boundary matrix with the chosen internal representation
    boundary_matrix = phat.boundary_matrix(representation = phat.representations.vector_vector)

    # set the respective columns -- (dimension, boundary) pairs
    boundary_matrix.columns = [ (0, []),
                                (0, []),
                                (1, [0,1]),
                                (0, []),
                                (1, [1,3]),
                                (1, [0,3]),
                                (2, [2,4,5])]

    # or equivalently, boundary_matrix = phat.boundary_matrix(representation = ..., columns = ...)
    # would combine the creation of the matrix and the assignment of the columns

    # print some information of the boundary matrix:
    print("\nThe boundary matrix has %d columns:" % len(boundary_matrix.columns))
    for col in boundary_matrix.columns:
        s = "Column %d represents a cell of dimension %d." % (col.index, col.dimension)
        if (col.boundary):
            s = s + " Its boundary consists of the cells " + " ".join([str(c) for c in col.boundary])
        print(s)
    print("Overall, the boundary matrix has %d entries." % len(boundary_matrix))

    pairs = boundary_matrix.compute_persistence_pairs()

    pairs.sort()

    print("\nThere are %d persistence pairs: " % len(pairs))
    for pair in pairs:
        print("Birth: %d, Death: %d" % pair)

References:

.. [1] H.Edelsbrunner, J.Harer: Computational Topology, An Introduction. American Mathematical Society, 2010, ISBN 0-8218-4925-5
.. [2] V.de Silva, D.Morozov, M.Vejdemo-Johansson: Dualities in persistent (co)homology. Inverse Problems 27, 2011
.. [3] C.Chen, M.Kerber: Persistent Homology Computation With a Twist. 27th European Workshop on Computational Geometry, 2011.
.. [4] U.Bauer, M.Kerber, J.Reininghaus: Clear and Compress: Computing Persistent Homology in Chunks. arXiv:1303.0477_
.. _arXiv:1303.0477: http://arxiv.org/pdf/1303.0477.pdf
.. _`Persistent Homology Algorithm Toolkit`: https://bitbucket.org/phat/phat-code