summaryrefslogtreecommitdiff
path: root/README.md
blob: 1194f0dc94306917967d9773fc7e187188a4cbaf (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
Introduction
=====

This package is a cleaned up subset of the hogdepodge of Python glue
that I regularly use to massage data into and out of persistent
homology and other TDA computations. It does *not* compute anything
itself, and exists primarily to marshal data into and out of the
formats used by the excellent [DIPHA](https://github.com/DIPHA/dipha)
software, in order to make it easier to pre- and post-process data in
Python. It also does some plotting.

Caveats
-----

I decided to clean up and release a subset of these scripts to make
life slightly easier for those who compute persistent homology and
prefer to manipulate data in Python. The scripts come with the
following caveats:

- The scripts were too messy to release as they were, and the cleaning
  up process means that this is now essentially untested software
  again. Beware.
- There is nothing of substance here. This is just glue around DIPHA.
- I use Python as modern Perl, and my experience with it is limited to
  quickly manipulating data without too much thought for writing
  structured software. Beware.
- I make no attempt to accurately reflect the full capabilities of
  DIPHA.
- Since Python will let you write `Interval("banana", 5)`, so will I,
  and you and your persistence generator "from banana to 5" can go
  solve problems down the road.
  


Installation
-----

Version 0.0.10 is available
[here](https://nonempty.org/software/python-phstuff/phstuff-0.0.10.tar.gz). The
package can be installed using standard Python tools, for example by
doing `python setup.py install --user` or something similar.

The package requires NumPy and Matplotlib. If you want them to manage
running DIPHA as well, then you of course need working dipha and
mpirun executables. If the environment variables DIPHA and/or MPIRUN
are set, their values specify these executables. If not, those are
searched for in PATH.

Todo/missing
----

* Polish support for general filtered complexes (DIPHA's
  `DIPHA_WEIGHTED_BOUNDARY_MATRIX`) to allow for non-flag
  complexes. This has high priority, and should be added soon.
* More flexible plotting.
* Possibly introduce a barcode class that keeps track of its own
  maximal scale.
* HTML documentation from the docstrings.
* Accessing persistence diagrams in degrees that are empty should give
  an empty list rather than throw an exception.
* Both terms "persistence diagram" and "barcode" are used. Pick one to
  make function names more guessable.
* Allow overriding temporary file directory.
* PHAT exports an actual library interface, so we should also
  interface with that, avoiding the annoying spawning of processes
  necessary with DIPHA.
  
Feedback
-----

Feedback, bug reports and feature requests are happily taken,
preferably by [e-mail](gard.spreemann@epfl.ch).


Use
=====

I primarily use this module to convert to and from DIPHA's file
formats. The `DiphaRunner` class is available for those who would like
to avoid manually running DIPHA, and don't need its distributed
computation capabilities.

I am not aiming for great performance, efficiency or generality with
this code. If you you want either, you are better off writing the
DIPHA files yourself. The modules are intended to be easy to use.


Examples
-----

### Saving a complete weighted graph in DIPHA's format

This is as simple as

    import phstuff.diphawrapper as dipha
    import numpy as np
    
    weights = np.random.uniform(0, 1, (100, 100)) # The graph.
    dipha.save_weight_matrix("weights.dipha", weights)
    
DIPHA can now be run on `weights.dipha`.

### Loading a DIPHA barcode file

To load a dimension-`1` persistence diagram DIPHA stored in
`out.dipha` (for example after running DIPHA on the file from the
previous example), do:

    import phstuff.diphawrapper as dipha

    barcode = dipha.load_barcode("out.dipha")
    for interval in barcode[1]:
        print(interval)

### Excluding edges/simplices above a certain weight

If we want to exclude all edges with weights above some threshold in
the filtration, we can either use `save_edge_list` or
`save_masked_weight_matrix`. An example of the latter is shown below:

    import phstuff.diphawrapper as dipha
    import numpy as np
    import numpy.ma as ma
    
    weights = np.random.uniform(0, 1, (100, 100))
    weights = (weights + weights.T)/2.0 # Symmetrize matrix (DIPHA doesn't specify 
                                        # which part of the weight matrix it actually
                                        # uses, so be safe and symmetrize).
    np.fill_diagonal(weights, 0)
    masked = ma.masked_greater(weights, 0.5) # All weights above 0.5 are
                                             # masked out and will not be
                                             # present in the graph,
                                             # effectively ending the
                                             # filtration at 0.5.
    dipha.save_masked_weight_matrix("weights.dipha", masked)

For more information about masked arrays, see
[the NumPy documentation](https://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html). Do
remember that the masked entries are the edges that will *not* be
present in the graph.


### Running DIPHA from Python

If you prefer to control the entire computation from Python, this
package can generate the necessary temporary files and run DIPHA on
them. The mpirun and dipha executables must be in the PATH environment
variable, or they must be specified through the MPIRUN and DIPHA
environment variables.

    import phstuff.diphawrapper as dipha
    import numpy as np
    import phstuff.barcode as bc
    import matplotlib.pyplot as plt

    weights = np.random.uniform(0, 1, (100, 100))
    weights = (weights + weights.T)/2.0 # Symmetrize matrix (DIPHA doesn't specify 
                                        # which part of the weight matrix it actually
                                        # uses, so be safe and symmetrize).
    np.fill_diagonal(weights, 0)
    dipharunner = dipha.DiphaRunner(2) # Compute up to 2-simplices.
    dipharunner.weight_matrix(weights) # See also dipharunner.masked_weight_matrix,
                                       # and dipharunner.edge_list.
    dipharunner.run()
    
    for interval in dipharunner.barcode[1]:
        print(interval)

    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    bc.plot(ax, dipharunner.barcode[1], weights.min(), weights.max())
    plt.show()

### PH of an arbitrary simplicial complex (EXPERIMENTAL)

The code for arbitrary simplicial complexes, the `simplicial` module,
is dirty, fragile and inefficient. Its API and behavior may change at
any time.

    import phstuff.diphawrapper as dipha
    import phstuff.barcode as bc
    import matplotlib.pyplot as plt
    import phstuff.simplicial as simpl

    cplx = simpl.Complex()

    cplx.add([0], 0)
    cplx.add([1], 0)
    cplx.add([2], 0)
    cplx.add([3], 0)
    cplx.add([4], 0)
    cplx.add([0,1], 0)
    cplx.add([0,2], 0)
    cplx.add([2,4], 0)
    cplx.add([3,4], 0)
    cplx.add([1,3], 1)
    cplx.add([0,3], 2)
    cplx.add([2,3], 3)
    cplx.add([0,2,3], 10)
    cplx.add([0,1,3], 20)

    dipharunner = dipha.DiphaRunner(2) # Compute up to 2-simplices.
    dipharunner.simplicial(cplx)
    dipharunner.run()

    for interval in dipharunner.barcode[1]:
        print(interval)

    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    bc.plot(ax, dipharunner.barcode[1], 0.0, 20.0)
    plt.show()


### PH of an alpha complex made with CGAL

TODO.