/*    This file is part of the Gudhi Library. The Gudhi library
 *    (Geometric Understanding in Higher Dimensions) is a generic C++
 *    library for computational topology.
 *
 *    Author(s):       Pawel Dlotko
 *
 *    Copyright (C) 2016 Inria
 *
 *    This program is free software: you can redistribute it and/or modify
 *    it under the terms of the GNU General Public License as published by
 *    the Free Software Foundation, either version 3 of the License, or
 *    (at your option) any later version.
 *
 *    This program is distributed in the hope that it will be useful,
 *    but WITHOUT ANY WARRANTY; without even the implied warranty of
 *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *    GNU General Public License for more details.
 *
 *    You should have received a copy of the GNU General Public License
 *    along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

#ifndef DOC_GUDHI_STAT_H_
#define DOC_GUDHI_STAT_H_

namespace Gudhi {
namespace Persistence_representations {

/**  \defgroup Persistence_representations Persistence representations
 *
 * \author   Pawel Dlotko
 *
 * @{
 *\section Persistence_representations_idea Idea

 In order to perform most of the statistical tests and machine learning algorithms on a data one need to be able to
 perform only a very limited number of operations on them. Let us fix a representation of
 data of a type A. To perform most of the statistical and machine learning operations one need to be able to compute
 average of objects of type A (so that the averaged object is also of a type A), to
 compute distance between objects of a type A, to vectorize object of a type A and to compute scalar product of a pair
 objects of a type A.

 To put this statement into a context, let us assume we have two collections \f$ c_1,\ldots,c_n\f$ and
 \f$d_1,...,d_n\f$ of objects of a type A. We want to verify if the average of those two collections
 are different by performing a permutation test.
 First of all, we compute averages of those two collections: C average of \f$ c_1,\ldots,c_n \f$ and D average of
 \f$d_1,\ldots,d_n\f$. Note that both C and D are of a type A. Then we compute \f$d(C,D)\f$,
 a distance between C and D.
 Later we put the two collections into one bin:
 \f[B = \{ c_1,...,c_n,d_1,...,d_n \}\f]
 Then we shuffle B, and we divide the shuffled version of B into two classes: \f$B_1\f$ and \f$B_2\f$ (in this case, of
 the same cardinality). Then we compute averages \f$\hat{B_1}\f$ and \f$\hat{B_2}\f$
 of elements in \f$B_1\f$ and \f$B_2\f$. Note that again, \f$\hat{B_1}\f$ and \f$\hat{B_2}\f$ are of a type A.
 Then we compute their distance \f$d(\hat{B_1},\hat{B_2})\f$. The procedure of shuffling and dividing the set \f$B\f$
 is repeated \f$N\f$ times (where \f$N\f$ is reasonably large number).
 Then the p-value of a statement that the averages of  \f$c_1,...,c_n\f$ and \f$d_1,...,d_n\f$ is approximated by the
 number of times \f$d(\hat{B_1},\hat{B_2}) > d(C,D)\f$ divided by \f$N\f$.

 The permutation test reminded above can be performed for any type A which can be averaged, and which allows for
 computations of distances.

 The Persistence\_representations contains a collection of various representations of persistent homology that
 implements various concepts described below:

 \li Concept of a representation of persistence that allows averaging (so that the average object is of the same type).
 \li Concept of representation of persistence that allows computations of distances.
 \li Concept of representation of persistence that allows computations of scalar products.
 \li Concept of representation of persistence that allows vectorization.
 \li Concept of representation of persistence that allows computations of real-valued characteristics of objects.


 At the moment an implementation of the following representations of persistence are available (further details of
 those representations will be discussed later):

 \li Exact persistence landscapes (allow averaging, computation of distances, scalar products, vectorizations and real
 value characteristics).
 \li Persistence landscapes on a grid (allow averaging, computation of distances scalar products, vectorizations and
 real value characteristics).
 \li Persistence heat maps – various representations where one put some weighted or not Gaussian kernel for each point
 of diagram (allow averaging, computation of distances, scalar products,
 vectorizations and real value characteristics).
 \li Persistence vectors (allow averaging, computation of distances, scalar products, vectorizations and real value
 characteristics).
 \li Persistence diagrams / barcodes (allow computation of distances, vectorizations and real value characteristics).


 Note that at the while functionalities like averaging, distances and scalar products are fixed, there is no canonical
 way of vectorizing and computing real valued characteristics of objects. Therefore the
 vectorizations and computation of real value characteristics procedures are quite likely to evolve in the furthering
 versions of the library.

 The main aim of this implementation is to be able to implement various statistical methods, both on the level of C++
 and on the level of python. The methods will operate on the functionalities offered
 by concepts. That means that the statistical and ML methods will be able to operate on any representation that
 implement the required concept (including the ones that are not in the library at the moment).
 That gives provides a framework, that is very easy to extend, for topological statistics.

 Below we are discussing the representations which are currently implemented in Persistence\_representations package:

 \section sec_persistence_landscapes Persistence Landscapes
 <b>Reference manual:</b> \ref Gudhi::Persistence_representations::Persistence_landscape <br>
 Persistence landscapes were originally proposed by Bubenik in \cite bubenik_landscapes_2015. Efficient algorithms to
 compute them rigorously were proposed by Bubenik and Dlotko in \cite bubenik_dlotko_landscapes_2016. The idea of
 persistence landscapes is shortly summarized in below.

 To begin with, suppose we are given a point \f$(b,d) \in \mathbb{R}^2\f$ in a
 persistence diagram. With this point, we associate a piecewise
 linear function \f$f_{(b,d)} : \mathbb{R} \rightarrow [0,\infty)\f$, which is
 defined as

  \f[f_{(b,d)}(x) =
  \left\{ \begin{array}{ccl}
            0     & \mbox{ if } & x \not\in (b, d) \; , \\
            x - b & \mbox{ if } & x \in \left( b, \frac{b+d}{2}
              \right] \; , \\
            d - x & \mbox{ if } & x \in \left(\frac{b+d}{2},
              d \right) \; .
  \end{array} \right.
 \f]

 A persistence landscape of the birth-death
 pairs \f$(b_i , d_i)\f$, where \f$i = 1,\ldots,m\f$, which constitute the given
 persistence diagram is the sequence of functions \f$\lambda_k : \mathbb{R} \rightarrow [0,\infty)\f$ for \f$k \in
 \mathbb{N}\f$, where \f$\lambda_k(x)\f$
 denotes the \f$k^{\rm th}\f$ largest value of the numbers \f$f_{(b_i,d_i)}(x)\f$,
 for \f$i = 1, \ldots, m\f$, and we define \f$\lambda_k(x) = 0\f$ if \f$k > m\f$.
 Equivalently, this sequence of functions can be combined into a single
 function \f$L : \mathbb{N} \times \mathbb{R} \to [0,\infty)\f$ of two
 variables, if we define \f$L(k,t) = \lambda_k(t)\f$.

 The detailed description of algorithms used to compute persistence landscapes can be found in \cite bubenik_dlotko_landscapes_2016.
 Note that this implementation provides exact representation of landscapes. That have many advantages, but also a few drawbacks. 
 For instance, as discussed in \cite bubenik_dlotko_landscapes_2016, the exact representation of landscape may be of quadratic size with respect
 to the input persistence diagram. It may therefore happen that, for very large diagrams, using this representation may be memory--prohibitive. 
 In such a case, there are two possible ways to proceed:

 \li Use representation on a grid---see section \ref sec_landscapes_on_grid.
 \li Compute just a number of initial nonzero landscapes. This option is available from C++ level as a last parameter of
 the constructor of persistence landscape (set by default to std::numeric_limits<size_t>::max()).


 \section sec_landscapes_on_grid Persistence Landscapes on a grid

 <b>Reference manual:</b> \ref Gudhi::Persistence_representations::Persistence_landscape_on_grid <br>
 <b>Reference manual:</b> \ref Gudhi::Persistence_representations::Persistence_landscape_on_grid_exact <br>

 Here, we provide alternative, not exact, representations of persistence landscapes defined in Section \ref sec_persistence_landscapes. 
 Unlike Section \ref sec_persistence_landscapes, we build representations of persistence landscapes by evaluating the landscape functions on a finite, equally distributed grid of points.
 We propose two different representations depending on whether the persistence intervals are also mapped on the grid (Persistence_landscape_on_grid) or not (Persistence_landscape_on_grid_exact).
 This makes a big difference since mapping the intervals on the grid makes the computation time smaller but only provides an approximation of the landscape values.

 Since persistence landscapes originating from persistence diagrams have slope \f$1\f$ or \f$-1\f$, we have an
 estimate of a region between the grid points where the landscapes can be located.
 That allows to estimate an error made when performing various operations on landscapes. Note that for average
 landscapes the slope is in range \f$[-1,1]\f$ and similar estimates can be used.

 Due to the lack of rigorous description of the algorithms for these non rigorous representations of persistence
 landscapes in the literature, we provide a short discussion below.

 Let us assume that we want to compute persistence landscape on a interval \f$[x,y]\f$. Let us assume that we want to
 use \f$N\f$ grid points for that purpose.
 Then we will sample the persistence landscape on points \f$x_1 = x , x_2 = x + \frac{y-x}{N}, \ldots , x_{N} = y\f$.
 Persistence landscapes are represented as a vector of
 vectors of real numbers. Assume that i-th vector consist of \f$n_i\f$ numbers sorted from larger to smaller. They
 represent the values of the functions
 \f$\lambda_1,\ldots,\lambda_{n_i}\f$ ,\f$\lambda_{n_i+1}\f$ and the functions with larger indices are then zero
 functions) on the i-th point of a grid, i.e. \f$x + i \frac{y-x}{N}\f$.

 When averaging two persistence landscapes represented by a grid we need to make sure that they are defined in a
 compatible grids, i.e. the intervals \f$[x,y]\f$ on which they are defined are
 the same, and the numbers of grid points \f$N\f$ are the same in both cases. If this is the case, we simply compute
 point-wise averages of the entries of the corresponding
 vectors (in this whole section we assume that if one vector of numbers is shorter than the other, we extend the shortest
 one with zeros so that they have the same length).

 Computations of distances between two persistence landscapes on a grid is not much different than in the rigorous
 case. In this case, we sum up the distances between the same levels of
 corresponding landscapes. For fixed level, we approximate the landscapes between the corresponding constitutive
 points of landscapes by linear functions, and compute the \f$L^p\f$ distance between them.

 Similarly as in case of distance, when computing the scalar product of two persistence landscapes on a grid, we sum up
 the scalar products of corresponding levels of landscapes. For each level,
 we assume that the persistence landscape on a grid between two grid points is approximated by a linear function.
 Therefore to compute the scalar product of two corresponding levels of landscapes,
 we sum up the integrals of products of line segments for every pair of constitutive grid points.

 Note that for these representations we need to specify a few parameters:

 \li Begin and end point of a grid -- the interval \f$[x,y]\f$ (real numbers).
 \li Number of points in a grid (positive integer \f$N\f$).


 Note that the same representation is used in TDA R-package \cite Fasy_Kim_Lecci_Maria_tda.

 \section sec_persistence_heat_maps Persistence heat maps

 <b>Reference manual:</b> \ref Gudhi::Persistence_representations::Persistence_heat_maps <br>
 <b>Reference manual:</b> \ref Gudhi::Persistence_representations::Persistence_heat_maps_exact <br>

 This is a general class of discrete structures which are based on idea of placing a kernel in the points of persistence diagrams.
 This idea appeared in work by many authors over the last 15 years. As far as we know this idea was firstly described
 in the work of Bologna group in \cite Ferri_Frosini_comparision_sheme_1 and \cite Ferri_Frosini_comparision_sheme_2.
 Later it has been described by Colorado State University group in \cite Persistence_Images_2017. The presented paper
 in the first time provided a discussion of stability of this representation.
 Also, the same ideas are used in the construction of two recent kernels used for machine learning:
 \cite Kusano_Fukumizu_Hiraoka_PWGK and \cite Reininghaus_Huber_ALL_PSSK. Both the kernels use
 interesting ideas to ensure stability of the representations with respect to the 1-Wasserstein metric. In the kernel
 presented in \cite Kusano_Fukumizu_Hiraoka_PWGK, a scaling function is used to multiply the Gaussian kernel in the
 way that the points close to diagonal have low weights and consequently do not have a big influence on the resulting
 distribution. In \cite Reininghaus_Huber_ALL_PSSK for every point \f$(b,d)\f$ two Gaussian kernels
 are added: first, with a weight 1 in a point \f$(b,d)\f$, and the second, with the weight -1 for a point \f$(b,d)\f$.
 In both cases, the representations are stable with respect to 1-Wasserstein distance.

 In Persistence_representations package, we currently implement a discretization of the distributions described above.
 The base of this implementation is a 2-dimensional array of pixels. To each pixel is assigned a real value which
 is the sum of the distribution values induced by each point of the persistence diagram. 
 As for Persistence_landscapes, we propose two different representations depending on whether the persistence intervals are also mapped on the pixels 
 (Persistence_heat_maps) or not (Persistence_heat_maps_exact).
 At the moment we compute the sum over the evaluations of the distributions on the pixel centers. It can be easily extended to any other function
 (like for instance the sum of the integrals of the distributions over the pixels).

 Concerning Persistence_heat_maps, the parameters that determine the structure are the following:

 \li A positive integer k determining the size of the kernel we used (we always assume that the kernels are square).
 \li A filter: in practice a square matrix of a size \f$2k+1 \times 2k+1\f$. By default, this is a discretization of
 N(0,1) kernel.
 \li The box \f$[x_0,x_1]\times [y_0,y_1]\f$ bounding the domain of the persistence image.
 \li Scaling function. Each Gaussian kernel at point \f$(p,q)\f$ gets multiplied by the value of this function at the
 point \f$(p,q)\f$.
 \li A boolean value determining if the space below diagonal should be erased or not. To be precise: when points close
 to diagonal are given then sometimes the kernel have support that reaches the region
 below the diagonal. If the value of this parameter is true, then the values below diagonal can be erased.

 Concerning Persistence_heat_maps_exact, only Gaussian kernels are implemented, so the parameters are the array of pixels, the weight functions for the Gaussians and the bandwidth of the Gaussians. 

 \section sec_persistence_vectors Persistence vectors
 <b>Reference manual:</b> \ref Gudhi::Persistence_representations::Vector_distances_in_diagram <br>
 This is a representation of persistent homology in a form of a vector which was designed for an application in 3d
 graphic in \cite Carriere_Oudot_Ovsjanikov_top_signatures_3d. Below we provide a short description of this
 representation.

 Given a persistence diagram \f$D = \{ (b_i,d_i) \}\f$, for every pair of birth--death points \f$(b_1,d_1)\f$ and
 \f$(b_2,d_2)\f$ we compute the following three distances:

 \li \f$d( (b_1,d_1) , (b_2,d_2) )\f$.
 \li \f$d( (b_1,d_1) , (\frac{b_1,d_1}{2},\frac{b_1,d_1}{2}) )\f$.
 \li \f$d( (b_2,d_2) , (\frac{b_2,d_2}{2},\frac{b_2,d_2}{2}) )\f$.

 We pick the smallest of those and add it to a vector. The obtained vector of numbers is then sorted in decreasing
 order. This way we obtain a persistence vector representing the diagram.

 Given two persistence vectors, the computation of distances, averages and scalar products is straightforward. Average
 is simply a coordinate-wise average of a collection of vectors. In this section we
 assume that the vectors are extended by zeros if they are of a different size. To compute distances we compute
 absolute value of differences between coordinates. A scalar product is a sum of products of
 values at the corresponding positions of two vectors.


\section sec_persistence_kernels Kernels on persistence diagrams
 <b>Reference manual:</b> \ref Gudhi::Persistence_representations::Sliced_Wasserstein <br>
 <b>Reference manual:</b> \ref Gudhi::Persistence_representations::Persistence_weighted_gaussian <br>

 Kernels for persistence diagrams can be regarded as infinite-dimensional vectorizations. More specifically,
 they are similarity functions whose evaluations on pairs of persistence diagrams equals the scalar products
 between images of these pairs under a map \f$\Phi\f$ taking values in a specific (possibly non Euclidean) Hilbert space \f$k(D_i, D_j) = \langle \Phi(D_i),\Phi(D_j)\rangle\f$.
 Reciprocally, classical results of learning theory ensure that such a \f$\Phi\f$ exists for a given similarity function \f$k\f$ if and only if \f$k\f$ is <i>positive semi-definite</i>.
 Kernels are designed for algorithms that can be <i>kernelized</i>, i.e., algorithms that only require to know scalar products between instances in order to run.
 Examples of such algorithms include Support Vector Machines, Principal Component Analysis and Ridge Regression. 

 There have been several attempts at defining kernels, i.e., positive semi-definite functions, between persistence diagrams within the last few years. We provide implementation
 for three of them:

 \li the <i>Persistence Scale Space Kernel</i>---see \cite Reininghaus_Huber_ALL_PSSK, which is the classical scalar product between \f$L^2\f$ functions, where persistence diagrams
 are turned into functions by centering and summing Gaussian functions over the diagram points and their symmetric counterparts w.r.t. the diagonal: \f$k(D_1,D_2)=\int \Phi(D_1)\Phi(D_2)\f$,
 where \f$\Phi(D)=\sum_{p\in D} {\rm exp}\left(-\frac{\|p-\cdot\|_2^2}{2\sigma^2}\right)\f$. 
 
 \li the <i>Persistence Weighted Gaussian Kernel</i>---see \cite Kusano_Fukumizu_Hiraoka_PWGK, which is a slight generalization of the previous kernel, is the scalar product between
 weighted Kernel Mean Embeddings of persistence diagrams w.r.t. the Gaussian Kernel \f$k_G\f$ (with corresponding map \f$\Phi_G\f$) in \f$\mathbb{R}^2\f$: 
 \f$k(D_1,D_2)=\langle\sum_{p\in D_1} w(p)\Phi_G(p), \sum_{q\in D_2} w(q)\Phi_G(q)\rangle\f$
 
 \li the  <i>Sliced Wasserstein Kernel</i>---see \cite pmlr-v70-carriere17a, which takes the form of a Gaussian kernel with a specific distance between persistence diagrams
     called the <i>Sliced Wasserstein Distance</i>: \f$k(D_1,D_2)={\rm exp}\left(-\frac{SW(D_1,D_2)}{2\sigma^2}\right)\f$

 When launching:

 \code $> ./Sliced_Wasserstein
 \endcode
 
 the program output is:
 
 \code $> Approx SW distance: 5.33648
 $> Exact SW distance: 5.33798
 $> Approx SW kernel: 0.0693743
 $> Exact  SW kernel: 0.0693224
 $> Distance induced by approx SW kernel: 1.36428
 $> Distance induced by exact  SW kernel: 1.3643 
 \endcode


 and when launching:

 \code $> ./Persistence_weighted_gaussian
 \endcode
 
 the program output is:
 
 \code $> Approx PWG kernel: 1.21509
 $> Exact  PWG kernel: 1.13628
 $> Distance induced by approx PWG kernel: 3.23354
 $> Distance induced by exact  PWG kernel: 3.25697
 $> Approx Gaussian PWG kernel: 0.0194222
 $> Exact  Gaussian PWG kernel: 0.0192524
 $> Approx PSS kernel: 0.134413
 $> Exact  PSS kernel: 0.133394
 \endcode

 */
/** @} */  // end defgroup Persistence_representations

}  // namespace Persistence_representations
}  // namespace Gudhi

#endif  // Persistence_representations