summaryrefslogtreecommitdiff
path: root/README.md
blob: 71824d2e584d21da44f1fed0834424dce79d57ab (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
Introduction
=====

This package is a cleaned up subset of the hogdepodge of Python glue
that I regularly use to massage data into and out of persistent
homology and other TDA computations. It does *not* compute anything
itself, and exists primarily to marshal data into and out of the
formats used by the excellent [DIPHA](https://github.com/DIPHA/dipha)
software, in order to make it easier to pre- and post-process data in
Python.

Caveats
-----

I decided to clean up and release a subset of these scripts to make
life slightly easier for those who compute persistent homology and
prefer to manipulate data in Python. The scripts come with the
following caveats:

- The scripts were too messy to release as they were, and the cleaning
  up process means that this is now essentially untested software
  again. Beware.
- There is nothing of substance here. This is just glue around DIPHA.
- I use Python as modern Perl, and my experience with it is limited to
  quickly manipulating data without too much thought for writing
  structured software. Beware.
- I make no attempt to accurately reflect the full capabilities of
  DIPHA.
- Since Python will let you write `Interval("banana", 5)`, so will I,
  and you and your persistence generator "from banana to 5" can go
  solve problems down the road.


Installation
-----

The scripts require NumPy and Matplotlib. If you want them to manage
running DIPHA as well, then you of course need working dipha and
mpirun executables. If the environment variables DIPHA and/or MPIRUN
are set, their values specify these executables. If not, those are
searched for in PATH.

Todo/missing
----

* Support for general filtered complexes (DIPHA's
  `DIPHA_WEIGHTED_BOUNDARY_MATRIX`) to allow for non-flag
  complexes. This has high priority, and should be added soon.
* More flexible plotting.


Use
=====

I primarily use this scripts to convert to and from DIPHA's file
formats. The `DiphaRunner` class is available for those who would like
to avoid manually running DIPHA.

I am not aiming for great performance, efficiency or generality with
these scripts. If you you want either, you are better off writing the
DIPHA files yourself. The scripts are intended to be easy to use.


Examples
-----

### Saving a complete weighted graph in DIPHA's format

This is as simple as

    import phstuff.diphawrapper as dipha
    import numpy as np
    
    weights = np.random.uniform(0, 1, (100, 100)) # The graph.
    dipha.save_weight_matrix("weights.dipha", weights)
    
DIPHA can now be run on "weights.dipha".

### Loading a DIPHA barcode file

To load a dimension-`1` persistence diagram DIPHA stored in
"out.dipha", do:

    import phstuff.diphawrapper as dipha

    barcode = dipha.load_barcode("out.dipha")
    for interval in barcode[1]:
        print(interval)

### Excluding edges/simplices above a certain weight

If we want to exclude all edges with weights above some threshold in
the filtration, we can either use `save_edge_list` or
`save_masked_weight_matrix`. An example of the latter is shown below:

    import phstuff.diphawrapper as dipha
    import numpy as np
    import numpy.ma as ma
    
    weights = np.random.uniform(0, 1, (100, 100))
    masked = ma.masked_greater(weights, 0.5) # All weights above 0.5 are
                                             # masked out and will not be
                                             # present in the graph,
                                             # effectively ending the
                                             # filtration at 0.5.
    dipha.save_masked_weight_matrix("weights.dipha", masked)

For more information about masked arrays, see
[the NumPy documentation](https://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html). Do
remember that the masked entries are the edges that will *not* be
present in the graph.


### Running DIPHA from Python

If you prefer to control the entire computation from Python, these
scripts can generate the necessary temporary files and run DIPHA on
them. For this, the mpirun and dipha executables must be in the PATH
environment variable, or they must be specified through the MPIRUN and
DIPHA environment variables.

    weights = np.random.uniform(0, 1, (100, 100))
    dipharunner = dipha.DiphaRunner(2) # Compute up to 2-simplices.
    dipharunner.weight_matrix(weights)
    dipharunner.run()
    
    for interval in dipharunner.barcode[1]:
        print(interval)

    fig = plt.figure()
    ax = fig.add_subplot(1,1,1)
    bc.plot(ax, dipharunner.barcode[1], weights.min(), weights.max())
    plt.show()



### PH of an alpha complex made with CGAL

TODO.