Introduction ===== This package is a cleaned up subset of the hogdepodge of Python glue that I regularly use to massage data into and out of persistent homology and other TDA computations. It does *not* compute anything itself, and exists primarily to marshal data into and out of the formats used by the excellent [DIPHA](https://github.com/DIPHA/dipha) software, in order to make it easier to pre- and post-process data in Python. It also does some plotting. Caveats ----- I decided to clean up and release a subset of these scripts to make life slightly easier for those who compute persistent homology and prefer to manipulate data in Python. The scripts come with the following caveats: - The scripts were too messy to release as they were, and the cleaning up process means that this is now essentially untested software again. Beware. - There is nothing of substance here. This is just glue around DIPHA. - I use Python as modern Perl, and my experience with it is limited to quickly manipulating data without too much thought for writing structured software. Beware. - I make no attempt to accurately reflect the full capabilities of DIPHA. - Since Python will let you write `Interval("banana", 5)`, so will I, and you and your persistence generator "from banana to 5" can go solve problems down the road. Installation ----- Version 0.0.11 is available [here](https://nonempty.org/software/python-phstuff/phstuff-0.0.11.tar.gz). The package can be installed using standard Python tools, for example by doing `python setup.py install --user` or something similar. The package requires NumPy and Matplotlib. If you want them to manage running DIPHA as well, then you of course need working dipha and mpirun executables. If the environment variables DIPHA and/or MPIRUN are set, their values specify these executables. If not, those are searched for in PATH. Todo/missing ---- * Polish support for general filtered complexes (DIPHA's `DIPHA_WEIGHTED_BOUNDARY_MATRIX`) to allow for non-flag complexes. This has high priority, and should be added soon. * More flexible plotting. * Possibly introduce a barcode class that keeps track of its own maximal scale. * HTML documentation from the docstrings. * Accessing persistence diagrams in degrees that are empty should give an empty list rather than throw an exception. * Both terms "persistence diagram" and "barcode" are used. Pick one to make function names more guessable. * Allow overriding temporary file directory. * PHAT exports an actual library interface, so we should also interface with that, avoiding the annoying spawning of processes necessary with DIPHA. Feedback ----- Feedback, bug reports and feature requests are happily taken, preferably by [e-mail](gard.spreemann@epfl.ch). Use ===== I primarily use this module to convert to and from DIPHA's file formats. The `DiphaRunner` class is available for those who would like to avoid manually running DIPHA, and don't need its distributed computation capabilities. I am not aiming for great performance, efficiency or generality with this code. If you you want either, you are better off writing the DIPHA files yourself. The modules are intended to be easy to use. Examples ----- ### Saving a complete weighted graph in DIPHA's format This is as simple as import phstuff.diphawrapper as dipha import numpy as np weights = np.random.uniform(0, 1, (100, 100)) # The graph. dipha.save_weight_matrix("weights.dipha", weights) DIPHA can now be run on `weights.dipha`. ### Loading a DIPHA barcode file To load a dimension-`1` persistence diagram DIPHA stored in `out.dipha` (for example after running DIPHA on the file from the previous example), do: import phstuff.diphawrapper as dipha barcode = dipha.load_barcode("out.dipha") for interval in barcode[1]: print(interval) ### Excluding edges/simplices above a certain weight If we want to exclude all edges with weights above some threshold in the filtration, we can either use `save_edge_list` or `save_masked_weight_matrix`. An example of the latter is shown below: import phstuff.diphawrapper as dipha import numpy as np import numpy.ma as ma weights = np.random.uniform(0, 1, (100, 100)) weights = (weights + weights.T)/2.0 # Symmetrize matrix (DIPHA doesn't specify # which part of the weight matrix it actually # uses, so be safe and symmetrize). np.fill_diagonal(weights, 0) masked = ma.masked_greater(weights, 0.5) # All weights above 0.5 are # masked out and will not be # present in the graph, # effectively ending the # filtration at 0.5. dipha.save_masked_weight_matrix("weights.dipha", masked) For more information about masked arrays, see [the NumPy documentation](https://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html). Do remember that the masked entries are the edges that will *not* be present in the graph. ### Running DIPHA from Python If you prefer to control the entire computation from Python, this package can generate the necessary temporary files and run DIPHA on them. The mpirun and dipha executables must be in the PATH environment variable, or they must be specified through the MPIRUN and DIPHA environment variables. import phstuff.diphawrapper as dipha import numpy as np import phstuff.barcode as bc import matplotlib.pyplot as plt weights = np.random.uniform(0, 1, (100, 100)) weights = (weights + weights.T)/2.0 # Symmetrize matrix (DIPHA doesn't specify # which part of the weight matrix it actually # uses, so be safe and symmetrize). np.fill_diagonal(weights, 0) dipharunner = dipha.DiphaRunner(2) # Compute up to 2-simplices. dipharunner.weight_matrix(weights) # See also dipharunner.masked_weight_matrix, # and dipharunner.edge_list. dipharunner.run() for interval in dipharunner.barcode[1]: print(interval) fig = plt.figure() ax = fig.add_subplot(1,1,1) bc.plot(ax, dipharunner.barcode[1], weights.min(), weights.max()) plt.show() ### PH of an arbitrary simplicial complex (EXPERIMENTAL) The code for arbitrary simplicial complexes, the `simplicial` module, is dirty, fragile and inefficient. Its API and behavior may change at any time. import phstuff.diphawrapper as dipha import phstuff.barcode as bc import matplotlib.pyplot as plt import phstuff.simplicial as simpl cplx = simpl.Complex() cplx.add([0], 0) cplx.add([1], 0) cplx.add([2], 0) cplx.add([3], 0) cplx.add([4], 0) cplx.add([0,1], 0) cplx.add([0,2], 0) cplx.add([2,4], 0) cplx.add([3,4], 0) cplx.add([1,3], 1) cplx.add([0,3], 2) cplx.add([2,3], 3) cplx.add([0,2,3], 10) cplx.add([0,1,3], 20) dipharunner = dipha.DiphaRunner(2) # Compute up to 2-simplices. dipharunner.simplicial(cplx) dipharunner.run() for interval in dipharunner.barcode[1]: print(interval) fig = plt.figure() ax = fig.add_subplot(1,1,1) bc.plot(ax, dipharunner.barcode[1], 0.0, 20.0) plt.show() ### PH of an alpha complex made with CGAL TODO.