Notebook source code: notebooks/17_foundations__stratified_spaces.ipynb
Run it yourself on binder

# Stratified spaces#

$$\textbf{Lead Author: Anna Calissano}$$

Dear lerner, the aim of the current notebook is to introduce stratified spaces and its implementation within geomstats.

## Spider#

The $$k$$-Spider consists of $$k$$ copies of the positive real line $$\mathbb{R}_{\geq 0}$$ glued together at the origin. Within geomstats, we defined the following: 1. Spider Point: a point object defining the ray and the value 2. Spider: the space defined by the number of rays 3. Spider Geometry: by chosing a metric on the rays, we can define a metric on the whole space

 In [1]:

import warnings

import geomstats.backend as gs

from geomstats.geometry.stratified.spider import Spider, SpiderPoint, SpiderMetric
from geomstats.geometry.euclidean import EuclideanMetric

warnings.filterwarnings("ignore")
gs.random.seed(2020)

INFO: Using numpy backend


We can define a spider with $$k=3$$ rays (strata) and sample two points from it.

 In [2]:

sp = Spider(n_rays=3)
sp.n_rays

 Out [2]:

3

 In [3]:

sp_points = sp.random_point(n_samples=2)
sp_points

 Out [3]:

[r0: 0.0, r0: 0.0]


The points are represented into the SpiderPoint format, where the first input is the stratum and the second input is the value along the stratum.

 In [4]:

print(sp_points[0].stratum)
print(sp_points[0].stratum_coord)

0
0.0


Given a metric $$d_{rays}$$ on the strata (the rays), we can extend it to the whole space by

$d_{Spider}(s_1,s_2)=d_{rays}(s_1,0) + d_{rays}(0,s_2)$
 In [5]:

sp_metric = SpiderMetric(ray_metric=EuclideanMetric(dim=1), space=sp)


Given two points on the Spider, we can compute the distance between the two points as well as the geodesic between the two.

 In [6]:

sp_metric.dist(sp_points[0], sp_points[1])

 Out [6]:

0.0

 In [7]:

sp_geo=sp_metric.geodesic(sp_points[0], sp_points[1])
print(sp_geo(0), sp_geo(0.5), sp_geo(1))

[r0: 0.0] [r0: 0.0] [r0: 0.0]


## Graph Space#

Graph Space is a space defined to describe set of graphs with a finite number of nodes which can be both node labelled or node unlabelled.

Inspired by: A. Calissano, A. Feragen, S. Vantini, Populations of unlabeled networks: Graph space geometry and geodesic principal components, MOX Report (2020)

We consider graphs as triples $$G=(V,E,a)$$, where the node set $$V$$ has at most $$n$$ elements, and the edge set $$E \subset V^2$$ has maximal size $$n^2$$. The nodes and edges are attributed with elements of an attribute space $$A$$, which is considered to be Euclidean, via an attribute map $$a \colon E \rightarrow A$$. Here, the map $$a$$ allows us to describe attributes on both edges and nodes, as we use self loop edges (diagonal elements in the graphs adjacency matrix) to assign attributes to nodes. A graph with scalar attributes is completely specified by a weighted adjacency matrix of dimension $$n\times n$$, residing in a space $$X=\mathbb{R}^{n^2}$$ of flattened adjacency matrices. If the attributes are vectors of dimension $$d$$, the graph is represented by a tensor of dimension $$n\times n\times d$$, residing in a space $$X=\mathbb{R}^{n\times n\times d}$$.

 In [8]:

import networkx as nx
import matplotlib.pyplot as plt

from geomstats.geometry.stratified.graph_space import GraphPoint, GraphSpace, GraphSpaceMetric


### Graph#

Consider a graph with $$n=3$$ nodes and $$A=\mathbb{R}$$ scalar attributes on nodes and edges. The class Graph describe a single graph via its adjacency matrix.

 In [9]:

adj = gs.array([[10,3,1],[3,2,4],[1,4,5]])

 Out [9]:

array([[10,  3,  1],
[ 3,  2,  4],
[ 1,  4,  5]])


To simplify the visualization and the access to different methods, the Graph object can be turned into a networkx graph.

 In [10]:

gr_nx = gr.to_networkx()
edges, weights = zip(*nx.get_edge_attributes(gr_nx,'weight').items())
pos = nx.spring_layout(gr_nx)
nx.draw(gr_nx, pos, node_color='b', edgelist=edges, edge_color=weights, width=5.0, edge_cmap=plt.cm.Blues)


### Set of Graphs: GraphSpace#

Graphs can have different numbers of nodes and different node labels or order. We assume the existence across the populations of at most $$n$$ distinct nodes and we add fictionally null nodes to smaller networks, so that all graphs can be described by a fixed-size adjacency matrix. Graph Space is initalized by the maximal number of nodes in the set.

 In [11]:

g_s = GraphSpace(n_nodes= 4)
g_s.n_nodes

 Out [11]:

4


Within GraphSpace, we can sample points from random adjacency matrices, we can check if the points belongs and we can return a set of adjacency matrices.

 In [12]:

gr_sampled = g_s.random_point(2)

 In [13]:

g_s.belongs(gr)

 Out [13]:

True


### Labelled or Unlabelled?#

Graphs can be considered labelled or unlablled, meaning that the observation in the set can share the same nodes (labelled) or not (unlabelled). We can describe both cases into Graph Space by using a little trick.

To deal with unlabelled nodes, alignment two graphs corresponds to finding optimal permutations of their nodes. By allowing node permutation, a concept of equivalence class is introduces (i.e., every network is associated to the set of networks obtained by permuting the original one). In geometrical terms, this can be described as a quotient space obtained by applying a permutation action to the set of adjacency matrix.

In details, the group of node permutations $$T$$ can be represented via permutation matrices, acting on $$X$$ through matrix multiplication. The binary operation:

$\cdot \colon T \times X \rightarrow X, (T,x)\mapsto Tx$

thus defines an action of the group $$T$$ on $$X$$. The obtained quotient space $$X/T$$ is called graph space, and each element of $$X/T$$ is an unlabelled graph $$G$$, represented as an equivalence class $$[x]=Tx$$ which contains all the flattened adjacency matrices in $$X$$ which can be obtained from $$x$$ by permuting nodes. The map $$\pi \colon X \to X/T$$ given by $$\pi(x) = [x]$$ can be thought of as a projection of the Euclidean total space $$X$$ onto the graph space $$X/T$$, and the total space $$X$$ plays a similar role relative to graph space, as the tangent space does for manifolds, by providing a Euclidean space in which approximate computations can be carried out and projected back onto the space of interest – in our case the graph space $$X/T$$.

To deal with labelled nodes, we restrict the set of permutation matrices to the identity: $$T=\{I\}$$

### Graph Space Metric#

To define a metric on graph space, we need to chose a metric on the total space. Any metric $$d_X$$ on $$X$$ defines a quotient pseudo-metric

$d_{X/T}([x_1],[x_2])=\min_{t\in T}d_X(x_1,t^Tx_2t)$

on $$X/T$$. Since the permutation group $$T$$ is finite, $$d_{X/T}$$ is a metric, and the graph space $$X/T$$ is a geodesic space. In the implementation, we suppose that the default metric in the total space is the Frobenius norm between adjacency matrix.

 In [14]:

gs_m = GraphSpaceMetric(space=g_s)
gs_m.total_space_metric

 Out [14]:

<geomstats.geometry.matrices.MatricesMetric at 0x20cff5efd60>


### Graph to Graph Alignment#

The metric on Graph Space relies on the optimization along the quotient fibers. In this context the optimization problem is known as graph matching (or network alignment) and corresponds in finding a match between the two sets of nodes which minimizes the distance between the corresponding adjacency matrices. The distance function needs an aligner input, which solves the minimization problem by returning the second input graph optimally aligned. Two of the available aligners are:

1. ‘ID’: the identity matching returning the same node order which corresponds to work with labelled graphs;

2. ‘FAQ’: the Fast Quadratic Assignment Matching implemented in scipy.optimize.quadratic assignment which is the state of the art in the matching literature based on the Frobenius norm.

 In [15]:

gs_m.dist(graph_a=gr_sampled[0], graph_b=gr_sampled[1])

 Out [15]:

1.4460818164862586

 In [16]:

gs_m.set_aligner("FAQ")

gs_m.dist(graph_a=gr_sampled[0], graph_b=gr_sampled[1])

 Out [16]:

1.3431542528099538


We can align a set of points using the following function, which returns the permuted graphs:

 In [17]:

gs_m.align_point_to_point(base_graph=gr_sampled, graph_to_permute=gr_sampled)

 Out [17]:

array([[[-0.34330033, -0.35911276,  0.25708028,  0.23632492],
[-0.14433691, -0.15890698,  0.16680305, -0.28289936],
[ 0.06142698, -0.37582122, -0.18026352,  0.45321387],
[-0.36264321,  0.0694131 ,  0.47566548,  0.00336706]],

[[ 0.16766421, -0.4658085 , -0.04388063, -0.34414864],
[-0.02395103, -0.33029756,  0.39625834, -0.12660624],
[-0.12030707,  0.35831659,  0.14606105,  0.0834617 ],
[ 0.16835003, -0.32220738,  0.34924802, -0.05762742]]])


The permutations of the nodes computed by the align_point_to_point function are saved in the perm_ attributes. Notice that only the last run output is saved.

 In [18]:

gs_m.perm_

 Out [18]:

array([[0, 1, 2, 3],
[0, 1, 2, 3]])


### Graph to Geodesic Alignment#

In different algorithms for graphs, we need to align a graph to a geodesic. Given a point $$[X] \in X/T$$, a $$\gamma : \mathbb{R} \rightarrow X$$, a domain $$[s_{min}, s_{max}] \in \mathbb{R}$$, the alignment with respect to a geodesic is performed as follow:

1. Sample $$s_i\in [s_{min}, s_{max}]$$

2. Compute $$\gamma(s_i)$$

3. Compute $$d_i = min_{t\in T}d_X(\gamma(s_i), t^TXt)$$ is minimum

4. Select the $$t^TXt$$ corresponding to the $$d_i$$ minimum

The algorithm is described in: Huckemann, S., Hotz, T., & Munk, A. (2010). Intrinsic shape analysis: Geodesic PCA for Riemannian manifolds modulo isometric Lie group actions. Statistica Sinica, 1-58.

 In [19]:

geo=gs_m.geodesic(gr_sampled[0], gr_sampled[1])


To perform the alignment between the geodesic and the point, we need to to define which methodology to adopt. This is specified in the set_point_to_geodesic function.

 In [20]:

gs_m.set_point_to_geodesic_aligner('default', s_min=-1., s_max=1.)

 Out [20]:

<geomstats.geometry.stratified.graph_space.PointToGeodesicAligner at 0x20cff666d90>

 In [21]:

gs_m.align_point_to_geodesic(geodesic=geo, graph_to_permute=gr_sampled[1])

 Out [21]:

array([[-0.05762742, -0.32220738,  0.34924802,  0.16835003],
[-0.12660624, -0.33029756,  0.39625834, -0.02395103],
[ 0.0834617 ,  0.35831659,  0.14606105, -0.12030707],
[-0.34414864, -0.4658085 , -0.04388063,  0.16766421]])