geomstats.geometry.stratified package#

Submodules#

geomstats.geometry.stratified.bhv_space module#

Class for the BHV Tree Space.

Lead author: Jonas Lueg

References

[BHV01] (1,2)

Billera, L. J., S. P. Holmes, K. Vogtmann. “Geometry of the Space of Phylogenetic Trees.” Advances in Applied Mathematics, volume 27, issue 4, pages 733-767, 2001. https://doi.org/10.1006%2Faama.2001.0759

[OP11] (1,2,3)

Owen, M., J. S. Provan. “A Fast Algorithm for Computing Geodesic Distances in Tree Space.” IEEE/ACM Transactions on Computational Biology and Bioinformatics, volume 8, issue 1, pages 2-13, 2011. https://doi.org/10.1109/TCBB.2010.3

class geomstats.geometry.stratified.bhv_space.BHVMetric(space)[source]#

Bases: PointSetMetric

BHV metric for Tree Space for phylogenetic trees.

The BHV Tree Space as it is introduced in [BHV01], a metric space that is CAT(0), and there exist unique geodesics between each pair of points in the BHV Space. The polynomial time algorithm for computing the distance and geodesic between two points is implemented, following the definitions and results of [OP11]. There, computing the geodesic between two trees is called the ‘Geodesic Tree Path’ problem, and that is why some methods below (not visible to the user though) start with the letters ‘gtp’.

Parameters:

total_space (TreeSpace) – Set with quotient structure.

dist(point_a, point_b)[source]#

Compute the distance between two points.

Parameters:
  • point_a (Tree or TreeBatch) – A point in BHV Space.

  • point_b (Tree or TreeBatch) – A point in BHV Space.

Returns:

dist (array-like, shape=[…]) – The distance between the two points.

geodesic(initial_point, end_point)[source]#

Compute the geodesic between two points.

Parameters:
  • initial_point (Tree or TreeBatch) – A point in BHV Space.

  • end_point (Tree or TreeBatch) – A point in BHV Space.

Returns:

geodesic (callable) – The geodesic between the two points. Takes parameter t, that is the time between 0 and 1 at which the corresponding point on the path is returned.

squared_dist(point_a, point_b)[source]#

Compute the squared distance between two points.

Parameters:
  • point_a (Tree or TreeBatch) – A point in BHV Space.

  • point_b (Tree or TreeBatch) – A point in BHV Space.

Returns:

squared_dist (array-like, shape=[…]) – The squared distance between the two points.

class geomstats.geometry.stratified.bhv_space.GTPSolver(n_labels, tol=1e-08)[source]#

Bases: object

‘Geodesic Tree Path’ problem solver [OP11].

Essentially uses Theorem 2.4 from [OP11].

Parameters:

tol (float) – Tolerance for the algorithm, in particular for the decision problem in the GTP algorithm in [OP11] to avoid unambiguity.

dist(point_a, point_b)[source]#

Compute the distance between two points.

Parameters:
  • point_a (Tree or TreeBatch) – A point in BHV Space.

  • point_b (Tree or TreeBatch) – A point in BHV Space.

Returns:

dist (array-like, shape=[…]) – The distance between the two points.

geodesic(initial_point, end_point)[source]#

Compute the geodesic between two points.

Parameters:
  • initial_point (Tree or TreeBatch) – A point in BHV Space.

  • end_point (Tree or TreeBatch) – A point in BHV Space.

Returns:

geodesic (callable) – The geodesic between the two points. Takes parameter t, that is the time between 0 and 1 at which the corresponding point on the path is returned.

squared_dist(point_a, point_b)[source]#

Compute the squared distance between two points.

Parameters:
  • point_a (Tree or TreeBatch) – A point in BHV Space.

  • point_b (Tree or TreeBatch) – A point in BHV Space.

Returns:

squared_dist (array-like, shape=[…]) – The squared distance between the two points.

class geomstats.geometry.stratified.bhv_space.Tree(splits, lengths)[source]#

Bases: Point

A class for trees, that are phylogenetic trees, elements of the BHV space.

A tree is essentially a phylogenetic tree with edges having length greater than zero. The representation of the tree is via splits, the edge lengths are stored in a vector.

Parameters:
  • splits (list[Split]) – The structure of the tree in form of a set of splits of the set of labels.

  • lengths (array-like, shape=[n_splits]) – The edge lengths of the splits, a vector containing positive numbers.

topology#

The topology of the tree.

Type:

TreeTopology

lengths#

The edge lengths of the splits, a vector containing positive numbers.

Type:

array-like, shape=[n_splits]

equal(point, atol=1e-12)[source]#

Check equality against another point.

Parameters:
  • point (Tree or TreeBatch) – Point to compare against point.

  • atol (float)

Returns:

is_equal (array-like, shape=[…])

class geomstats.geometry.stratified.bhv_space.TreeBatch(iterable=(), /)[source]#

Bases: PointBatch

Tree batch.

property lengths#

Edge lengths.

Returns:

lengths (array-like, shape=[n_points, n_splits])

property topology#

Tree topology.

Returns:

topology (list[TreeTopology])

class geomstats.geometry.stratified.bhv_space.TreeSpace(n_labels, equip=True)[source]#

Bases: PointSet

Class for the Tree space, a point set containing phylogenetic trees.

A topological space. Points in Tree space are instances of the class Tree: phylogenetic trees with edge lengths between 0 and infinity. For the space of trees see also [BHV01].

Parameters:

n_labels (int) – The number of labels in the trees.

belongs(point, atol=1e-12)[source]#

Check if a point belongs to Tree space.

Parameters:
  • point (Tree or TreeBatch) – The point to be checked.

  • atol (float) – Absolute tolerance. Optional, default: backend atol.

Returns:

belongs (array-like, shape=[…]) – Boolean denoting if point belongs to Tree space.

static default_metric()[source]#

Metric to equip the space with if equip is True.

random_point(n_samples=1, p_keep=0.9, btol=1e-08)[source]#

Sample a random point in Tree space.

Parameters:
  • n_samples (int) – Number of samples. Defaults to 1.

  • p_keep (float between 0 and 1) – The probability that a sampled edge is kept and not deleted randomly. To be precise, it is not exactly the probability, as some edges cannot be deleted since the requirement that two labels are separated by a split might be violated otherwise. Defaults to 0.9

  • btol (float) – Tolerance for the boundary of the edge lengths. Defaults to 1e-08.

Returns:

samples (Tree or TreeBatch) – Points sampled in Tree space.

class geomstats.geometry.stratified.bhv_space.TreeTopology(splits)[source]#

Bases: ForestTopology

The topology of a tree, using a split-based representation.

Parameters:

splits (list[Split]) – The structure of the tree in form of a set of splits of the set of labels.

n_labels#

Number of labels, the set of labels is then \(\{0,\dots,n-1\}\).

Type:

int

where#

Give the index of a split in the flattened list of all splits.

Type:

dict

sep#

An increasing list of numbers between 0 and m, where m is the total number of splits in self.split_sets, starting with 0, where each number indicates that a new connected component starts at that index. Useful for example for unraveling the tuple of all splits into self.split_sets.

Type:

list of int

paths#

A list of dictionaries, each dictionary is for the respective connected component of the forest, and the items of each dictionary are for each pair of labels u, v, u < v in the respective component, a list of the splits on the unique path between the labels u and v.

Type:

list of dict

support#

For each split, give an \(n\times n\) dimensional matrix, where the uv-th entry is True if the split separates the labels u and v, else False.

Type:

list of array-like

property labels#

Node labels.

Returns:

labels (tuple[int]) – Node labels.

property splits#

Splits.

Returns:

splits (list[Split]) – The structure of the tree in form of a set of splits of the set of labels.

geomstats.geometry.stratified.bhv_space.generate_random_tree(n_labels, p_keep=0.9, btol=1e-08)[source]#

Generate a random instance of Tree.

Parameters:
  • p_keep (float between 0 and 1) – The probability that a sampled edge is kept and not deleted randomly. To be precise, it is not exactly the probability, as some edges cannot be deleted since the requirement that two labels are separated by a split might be violated otherwise. Defaults to 0.9

  • btol (float) – Tolerance for the boundary of the edge lengths. Defaults to 1e-08.

geomstats.geometry.stratified.graph_space module#

Graph Space.

Lead author: Anna Calissano.

References

[Calissano2020]

Calissano, A., Feragen, A., Vantini, S. “Graph Space: Geodesic Principal Components for a Population of Network-valued Data.” Mox report 14, 2020. https://mox.polimi.it/reports-and-theses/publication-results/?id=855.

class geomstats.geometry.stratified.graph_space.ExhaustiveAligner(total_space)[source]#

Bases: GraphSpaceAlignerAlgorithm

Brute force exact alignment.

Exact Alignment obtained by exploring the whole permutation group.

Parameters:

total_space (GraphSpace) – Set with quotient structure.

Notes

Not recommended for large n_nodes.

class geomstats.geometry.stratified.graph_space.FAQAligner(total_space)[source]#

Bases: GraphSpaceAlignerAlgorithm

Fast Quadratic Assignment for graph matching (or network alignment).

References

[Vogelstein2015]

Vogelstein JT, Conroy JM, Lyzinski V, Podrazik LJ, Kratzer SG, Harley ET, Fishkind DE, Vogelstein RJ, Priebe CE. “Fast approximate quadratic programming for graph matching.“ PLoS One. 2015 Apr 17; doi: 10.1371/journal.pone.0121002.

class geomstats.geometry.stratified.graph_space.GraphSpace(n_nodes, equip=True)[source]#

Bases: Matrices

Class for the Graph Space.

Graph Space to analyse populations of labelled and unlabelled graphs. The space focuses on graphs with scalar euclidean attributes on nodes and edges, with a finite number of nodes and both directed and undirected edges. For undirected graphs, use symmetric adjacency matrices. The space is a quotient space obtained by applying the permutation action of nodes to the space of adjacency matrices. Notice that for computation reasons the module works with both the gs.array representation of graph and the GraphPoint representation.

Points are represented by \(nodes \times nodes\) adjacency matrices. Both the array input and the Graph Point type input work.

Parameters:
  • n_nodes (int) – Number of graph nodes

  • total_space (space) – Total Space before applying the permutation action. Default: Euclidean Space.

References

[Calissano2020]

Calissano, A., Feragen, A., Vantini, S. “Graph Space: Geodesic Principal Components for a Population of Network-valued Data.” Mox report 14, 2020. https://mox.polimi.it/reports-and-theses/publication-results/?id=855.

[Jain2009]

Jain, B., Obermayer, K. “Structure Spaces.” Journal of Machine Learning Research, 10(11), 2009. https://www.jmlr.org/papers/volume10/jain09a/jain09a.pdf

equip_with_group_action(group_action='permutations')[source]#

Equip manifold with group action.

new(equip=True)[source]#

Create manifold with same parameters.

class geomstats.geometry.stratified.graph_space.GraphSpaceAligner(total_space, align_algo=None)[source]#

Bases: Aligner

Graph space aligner.

Parameters:
  • total_space (GraphSpace) – Set with quotient structure.

  • align_algo (GraphSpaceAlignerAlgorithm) – Algorihtm performing alignment.

MAP_ALIGNER = {'FAQ': <class 'geomstats.geometry.stratified.graph_space.FAQAligner'>, 'exhaustive': <class 'geomstats.geometry.stratified.graph_space.ExhaustiveAligner'>}#
align_point_to_geodesic(geodesic, point)[source]#

Align point to a geodesic.

Using the selected alignment technique, it returns the aligned point as optimally aligned to the geodesic.

Parameters:
  • geodesic (function) – Geodesic.

  • point (array-like, shape=[…, n_nodes, n_nodes]) – Graph to align.

Returns:

permuted_graph (list, shape = […, n_nodes, n_nodes])

property perm_#

Optimal node permutations.

Returns:

  • perm_ (array-like, shape=[…, n_nodes]) – Node permutations where in position i we have the value j meaning

  • the node i should be permuted with node j.

set_alignment_algorithm(align_algo='FAQ', **kwargs)[source]#

Set the aligning strategy.

GraphSpace metric relies on alignment. In this module we propose the the FAQ graph matching by [Vogelstein2015], and exhaustive aligner which explores the whole permutation group.

Parameters:

align_algo (str or GraphSpaceAlignerAlgorithm) – ‘FAQ’: Fast Quadratic Assignment - only compatible with Frobenius norm, ‘exhaustive’: all group exhaustive search

set_point_to_geodesic_aligner(aligner='default', **kwargs)[source]#

Set the alignment between a point and a geodesic.

Following the geodesic to point alignment in [Calissano2020] and [Huckemann2010], this function defines the parameters [s_min, s_max] and the number of points to sample in the domain.

Parameters:
  • aligner (BasePointToGeodesicAligner)

  • s_min (float) – Minimum value of the domain to sample along the geodesics.

  • s_max (float) – Minimum value of the domain to sample along the geodesics.

  • n_points (int) – Number of points to sample between s_min and s_max.

class geomstats.geometry.stratified.graph_space.GraphSpaceAlignerAlgorithm(total_space)[source]#

Bases: AlignerAlgorithm, ABC

Base class for graph space numerical aligner.

total_space#

Set with quotient structure.

Type:

GraphSpace

perm_#

Node permutations where in position i we have the value j meaning the node i should be permuted with node j.

Type:

array-like, shape=[…, n_nodes]

align(point, base_point)[source]#

Align point to base point.

Parameters:
  • point (array-like, shape=[…, n_nodes, n_nodes]) – Graph to align.

  • base_point (array-like, shape=[…, n_nodes, n_nodes]) – Reference graph.

Returns:

aligned_point (array-like, shape=[…, n_nodes, n_nodes]) – Aligned graph.

class geomstats.geometry.stratified.graph_space.GraphSpaceQuotientMetric(space, total_space)[source]#

Bases: QuotientMetric

Class for the Graph Space Metric.

Every metric \(d: X \times X \rightarrow \mathbb{R}\) on the total space of adjacency matrices can descend onto the quotient space as a pseudo-metric: \(d([x_1],[x_2]) = min_{t\in T} d_X(x_1, t^Tx_2t)\). The metric relies on the total space metric and an alignment procedure, i.e., Graph Matching or Networks alignment procedure. Metric, alignment, geodesics, and alignment with respect to a geodesic are defined. By default, the alignment is FAQ and the total space metric is the Frobenious norm.

Parameters:

space (GraphSpace) – GraphSpace object.

References

[Calissano2020]

Calissano, A., Feragen, A., Vantini, S. “Graph Space: Geodesic Principal Components for a Population of Network-valued Data.” Mox report 14, 2020. https://mox.polimi.it/reports-and-theses/publication-results/?id=855.

[Jain2009]

Jain, B., Obermayer, K. “Structure Spaces.” Journal of Machine Learning Research 10.11 (2009). https://www.jmlr.org/papers/v10/jain09a.html.

class geomstats.geometry.stratified.graph_space.PointToGeodesicAligner(total_space, s_min=0.0, s_max=1.0, n_grid=10)[source]#

Bases: PointToGeodesicAlignerBase

Class for the alignment of the points with respect to a geodesic.

Implementing the algorithm in [Huckemann2010] to select an optimal alignment to a point with respect to a geodesic. The algorithm sample discrete set of n_points along the geodesic between [s_min, s_max] and find the permutation that gets closer to the datapoints along the geodesic.

Parameters:
  • total_space (GraphSpace) – Set with quotient structure.

  • s_min (float) – Minimum value of the domain to sample along the geodesics.

  • s_max (float) – Minimum value of the domain to sample along the geodesics.

  • n_grid (int) – Number of points to sample between s_min and s_max.

References

[Calissano2020]

Calissano, A., Feragen, A., Vantini, S. “Graph Space: Geodesic Principal Components for a Population of Network-valued Data.” Mox report 14, 2020. https://mox.polimi.it/reports-and-theses/publication-results/?id=855.

[Huckemann2010] (1,2)

Huckemann, S., Hotz, T., Munk, A. “Intrinsic shape analysis: Geodesic PCA for Riemannian manifolds modulo isometric Lie group actions.” Statistica Sinica, 1-58, 2010.

align(geodesic, point)[source]#

Align the graph to the geodesic.

Parameters:
  • geodesic (function) – Geodesic function in GraphSpace.

  • point (array-like, shape=[…, n_nodes, n_nodes]) – Graph to align.

Returns:

aligned_point (array-like, shape=[…, n_nodes, n_nodes]) – Permuted graph as to be aligned with respect to the geodesic.

dist(geodesic, point)[source]#

Compute the distance between the geodesic and the point.

Parameters:
  • geodesic (function) – Geodesic function in GraphSpace.

  • point (array-like, shape=[…, n_nodes, n_nodes]) – Graph to align.

Returns:

dist (array-like, shape=[…, n_nodes]) – Distance between the point and the geodesic.

Notes

Due to the discrete nature of the method, distance is not very accurate.

property s_grid#

Save the domain discretization.

class geomstats.geometry.stratified.graph_space.PointToGeodesicAlignerBase(total_space)[source]#

Bases: ABC

Base class for point to geodesic aligner.

Parameters:

total_space (GraphSpace) – Set with quotient structure.

perm_#

Node permutations where in position i we have the value j meaning the node i should be permuted with node j.

Type:

array-like, shape=[…, n_nodes]

abstract align(geodesic, point)[source]#

Class for the alignment of the geodesic with respect to a point.

abstract dist(geodesic, point)[source]#

Class to compute distance between the geodesic with respect to a point.

geomstats.geometry.stratified.point_set module#

Class for Stratified Spaces.

Lead authors: Anna Calissano and Jonas Lueg

class geomstats.geometry.stratified.point_set.Point[source]#

Bases: ABC

Class for points of a set.

abstract equal(point, atol=1e-12)[source]#

Check equality against another point.

Parameters:
  • point (Point or PointBatch) – Point to compare against point.

  • atol (float)

Returns:

is_equal (array-like, shape=[…])

class geomstats.geometry.stratified.point_set.PointBatch(iterable=(), /)[source]#

Bases: ABC, list

Class for point batch.

equal(point, atol=1e-12)[source]#

Check equality against another point.

Parameters:
  • point (Point or PointBatch) – Point to compare against point.

  • atol (float)

class geomstats.geometry.stratified.point_set.PointSet(equip=True)[source]#

Bases: ABC

Class for a set of points of type Point.

abstract belongs(point, atol=1e-12)[source]#

Evaluate if a point belongs to the set.

Parameters:
  • point (Point or PointBatch) – Point to evaluate.

  • atol (float) – Absolute tolerance. Optional, default: backend atol.

Returns:

belongs (array-like, shape=[…]) – Boolean evaluating if point belongs to the set.

equip_with_metric(Metric=None, **metric_kwargs)[source]#

Equip manifold with Metric.

Parameters:

Metric (PointSetMetric object) – If None, default metric will be used.

abstract random_point(n_samples=1)[source]#

Sample random points on the PointSet.

Parameters:

n_samples (int) – Number of samples. Optional, default: 1.

Returns:

samples (Point or PointBatch) – Points sampled on the PointSet.

class geomstats.geometry.stratified.point_set.PointSetMetric(space)[source]#

Bases: ABC

Class for the lenght spaces.

Parameters:

space (PointSet) – Set to equip with metric.

abstract dist(point_a, point_b)[source]#

Distance between two points in the PointSet.

Parameters:
  • point_a (Point or PointBatch) – Point in the PointSet.

  • point_b (Point or PointBatch) – Point in the PointSet.

Returns:

distance (array-like, shape=[…]) – Distance.

abstract geodesic(initial_point, end_point)[source]#

Compute the geodesic in the PointSet.

Parameters:
  • initial_point (Point or PointBatch) – Point in the PointSet.

  • end_point (Point or PointBatch) – Point in the PointSet.

Returns:

path (callable) – Time parameterized geodesic curve.

geomstats.geometry.stratified.quotient module#

Quotient structure for a geodesic metric space.

class geomstats.geometry.stratified.quotient.Aligner(total_space, align_algo=None)[source]#

Bases: ABC

Bundle structure.

Parameters:
  • total_space (PointSet) – Set with quotient structure.

  • align_algo (AlignerAlgorithm) – Algorihtm performing alignment.

align(point, base_point)[source]#

Align point to base point.

Parameters:
  • point (array-like, shape=[…, *point_shape]) – Point to align.

  • base_point (array-like, shape=[…, *point_shape]) – Reference point.

Returns:

aligned_point (list, shape = […, *point_shape])

class geomstats.geometry.stratified.quotient.QuotientMetric(space, total_space)[source]#

Bases: PointSetMetric, ABC

Quotient metric.

Parameters:
  • space (PointSet) – Set to equip with metric.

  • total_space (PointSet) – Set with quotient structure.

dist(point_a, point_b)[source]#

Compute distance between two points.

Parameters:
  • point_a (array-like, shape=[…, *point_shape])

  • point_b (array-like, shape=[…, *point_shape])

Returns:

distance (array-like, shape=[…]) – Distance between the points.

geodesic(initial_point, end_point)[source]#

Compute geodesic between two points.

Parameters:
  • initial_point (array-like, shape=[…, *point_shape]) – Initial point.

  • end_point (array-like, shape=[…, *point_shape]) – End point.

Returns:

geodesic (callable) – Geodesic function.

squared_dist(point_a, point_b)[source]#

Compute distance between two points.

Parameters:
  • point_a (array-like, shape=[…, *point_shape])

  • point_b (array-like, shape=[…, *point_shape])

Returns:

distance (array-like, shape=[…]) – Distance between the points.

geomstats.geometry.stratified.spider module#

Class for the spider.

Lead authors: Anna Calissano & Jonas Lueg

class geomstats.geometry.stratified.spider.Spider(n_rays, equip=True)[source]#

Bases: PointSet

Spider: a set of rays attached to the origin.

The k-spider consists of k copies of the positive real line \(\mathbb{R}_{\geq 0}\) glued together at the origin [Feragen2020].

Parameters:

n_rays (int) – Number of rays to attach to the origin.

References

[Feragen2020]

Feragen, Aasa, and Tom Nye. “Statistics on stratified spaces.” Riemannian Geometric Statistics in Medical Image Analysis. Academic Press, 2020. 299-342.

belongs(point, atol=1e-12)[source]#

Check if a random point belongs to the spider set.

Parameters:

point (SpiderPoint or PointBatch) – Point to be checked.

Returns:

belongs (array-like, shape=[…]) – Boolean evaluating if point belongs to the set.

static default_metric()[source]#

Metric to equip the space with if equip is True.

random_point(n_samples=1)[source]#

Compute a random point of the spider set.

Parameters:

n_samples (int) – Number of samples. Optional, default: 1.

Returns:

samples (SpiderPoint or PointBatch) – List of SpiderPoints randomly sampled from the Spider.

class geomstats.geometry.stratified.spider.SpiderMetric(space)[source]#

Bases: PointSetMetric

Geometry on the Spider, induced by the rays metric.

dist(point_a, point_b)[source]#

Compute the distance between two points on the Spider using the ray geometry.

The spider metric is the metric in each ray extended to the Spider: given two points x, y on different rays, d(x, y) = d(x, 0) + d(0, y).

Parameters:
  • point_a (SpiderPoint or PointBatch) – Point in the Spider.

  • point_b (SpiderPoint or PointBatch) – Point in the Spider.

Returns:

dist (array-like, shape=[…]) – Distance between points.

geodesic(initial_point, end_point)[source]#

Return the geodesic between two lists of Spider points.

Parameters:
  • initial_point (SpiderPoint or PointBatch) – Point in the Spider.

  • end_point (SpiderPoint or PointBatch) – Point in the Spider.

Returns:

path (callable) – Return a vectorized geodesic function.

class geomstats.geometry.stratified.spider.SpiderPoint(stratum, coord)[source]#

Bases: Point

Class for points of the Spider.

A point in the Spider is \((s,c) \in \mathbb{N} \times \mathbb{R}\).

Parameters:
  • stratum (int) – The stratum, an integer indicating the stratum the point lies in. If zero, then the point is on the origin.

  • coord (array-like, shape=[1,]) – A positive number, the coordinate of the point. It must be zero if and only if the stratum is zero, i.e. the origin.

equal(point, atol=1e-12)[source]#

Check equality against another point.

Parameters:
  • point (Point or PointBatch) – Point to compare against.

  • atol (float)

Returns:

is_equal (array-like, shape=[…])

geomstats.geometry.stratified.trees module#

Helper classes for tree spaces.

Class Split. Essentially, a Split is a two-set partition of a subset of \(\{0,\dots,n-1\}\). This class is designed such that one part of both parts of the partition can be empty. Splits are corresponding uniquely to edges in a phylogenetic forest, where, if one cuts the edge in the forest, the resulting two-set partition of the labels of the respective component of the forest is the corresponding split.

Lead author: Jonas Lueg

class geomstats.geometry.stratified.trees.ForestTopology(partition, split_sets)[source]#

Bases: object

The topology of a forest, using a split-based graph-structure representation.

A forest topology is a partition into non-empty sets of the set \(\{0,\dots,n-1\}\), together with a set of splits for each element of the partition, where every split is a two-set partition of the respective element. A structure basically describes a phylogenetic forest, where each set of splits gives the structure of the tree with the labels of the corresponding element of the partition.

Parameters:
  • partition (tuple) – A tuple of tuples that is a partition of the set \(\{0,\dots,n-1\}\), representing the label sets of each connected component of the forest topology.

  • split_sets (tuple) – A tuple of tuples containing splits, where each set of splits contains only splits of the respective label set in the partition, so their order is related. The splits are the edges of the connected components of the forest, respectively, and thus the union of all sets of splits yields all edges of the forest topology.

n_labels#

Number of labels, the set of labels is then \(\{0,\dots,n-1\}\).

Type:

int

n_splits#

Number of splits.

Type:

int

where#

Give the index of a split in the flattened list of all splits.

Type:

dict

sep#

An increasing list of numbers between 0 and m, where m is the total number of splits in self.split_sets, starting with 0, where each number indicates that a new connected component starts at that index. Useful for example for unraveling the tuple of all splits into self.split_sets.

Type:

list of int

paths#

A list of dictionaries, each dictionary is for the respective connected component of the forest, and the items of each dictionary are for each pair of labels u, v, u < v in the respective component, a list of the splits on the unique path between the labels u and v.

Type:

list of dict

support#

For each split, give an \(n\times n\) dimensional matrix, where the uv-th entry is True if the split separates the labels u and v, else False.

Type:

list of array-like

corr(weights)[source]#

Compute the correlation matrix of the topology with edge weights weights.

Parameters:

weights (array-like, [n_splits]) – Edge weights.

Returns:

corr (array-like, shape=[n, n]) – Returns the corresponding correlation matrix.

corr_gradient(weights)[source]#

Compute the gradient of the correlation matrix, differentiated by weights.

Parameters:

weights (array-like, [n_splits]) – The vector weights at which the gradient is computed.

Returns:

gradient (array-like, shape=[n_splits, n, n]) – The gradient of the correlation matrix, differentiated by weights.

class geomstats.geometry.stratified.trees.Split(part1, part2)[source]#

Bases: object

Two-set partitions of sets.

Two-set partitions of a smaller subset of \(\{0,...,n-1\}\) are also allowed, where \(n\) is a positive integer, which is not passed as an argument as it is nowhere needed.

The parameters part1 and part2 are assigned to the attributes self.part1 and self.part2 in a unique sorted way: the one that contains the smallest minimal value is assigned to self.part1 for consistency.

Parameters:
  • part1 (iterable) – The first part of the split, an iterable that is a subset of \(\{0,\dots,n-1\}\). It may be empty, but must have empty intersection with part2.

  • part2 (iterable) – The second part of the split, an iterable that is a subset of \(\{0,\dots,n-1\}\). It may be empty, but must have empty intersection with part1.

get_part_away_from(other)[source]#

Return the part of this split that is directed away from other split.

Parameters:

other (Split) – The other split.

Returns:

part_that_does_not_point (iterable) – Return the part of the split self that does not point toward other. See self.get_part_towards for further explanation.

get_part_towards(other)[source]#

Return the part of this split that is directed toward the other split.

Each split contains part1 and part2, the parts that the corresponding edge in the graph splits the set of labels into. Thus, one can think of the split as an edge, where part1 points in the direction of the part of the tree where the labels of part1 are contained, and part2 points in the other direction. So, part1 points in the direction of other, if it corresponds to an edge that is contained in the tree that part1 points to, else part2 points in the direction of split.

Parameters:

other (Split) – The other split.

Returns:

part_towards (iterable) – Return the part of the split self that points toward other_split.

is_compatible(other)[source]#

Check whether this split is compatible with another split.

Two splits are compatible, if at least one intersection of the respective parts of the splits is empty.

Parameters:

other (Split) – The other split.

Returns:

is_compatible_with (bool) – Return True if the splits are compatible, else False.

part_contains(subset)[source]#

Determine if a subset is contained in either part of a split.

Parameters:

subset (set) – The subset containing labels.

Returns:

is_contained (bool) – A boolean that is true if the subset is contained in self.part1 or self.part2.

restrict_to(subset)[source]#

Return the restriction of a split to a subset.

Parameters:

subset (set) – The subset that the split is restricted to.

Returns:

restr_split (Split) – The restricted split, if the split is \(A\vert B\), then the split restricted to the subset \(C\) is \(A\cap C\vert B\cap C\).

separates(u, v)[source]#

Determine whether the labels (or label sets) are separated by the split.

Parameters:
  • u (list of int, int) – Either an integer or a set of labels.

  • v (list of int, int) – Either an integer or a set of labels.

Returns:

is_separated (bool) –

A boolean determining whether u and v are separated by the split (i.e. if

they are not in the same part).

geomstats.geometry.stratified.trees.check_if_separated(labels, splits)[source]#

Check for each pair of labels if exists split that separates them.

Parameters:
  • labels (list[int]) – A list of integers, the set of labels that we generate splits for.

  • splits (list[Split]) – A list of splits of the set of labels.

Returns:

are_separated (bool) – True if the labels are pair-wise separated by a split else False.

geomstats.geometry.stratified.trees.delete_splits(splits, labels, p_keep, check=True)[source]#

Delete splits randomly from a set of splits.

We require the splits to satisfy the check for if all pair-wise labels are separated. In this way, before deleting a split, this condition is checked to make sure it is not violated.

Parameters:
  • splits (list[Split]) – A list of splits of the set of labels.

  • labels (list[int]) – A list of integers, the set of labels that we generate splits for.

  • p_keep (float) – A float between 0 and 1 determining the probability with which a split is kept and not deleted.

  • check (bool) – If True, checks if splits still separate all labels. In this case, the split will not be deleted. If False, any split can be randomly deleted.

Returns:

left_over_splits (list[Split]) – The list of splits that are not deleted.

geomstats.geometry.stratified.trees.generate_splits(labels)[source]#

Generate random maximal set of compatible splits of set labels.

This method works inductively on the number of elements in labels. Start with a split of two randomly chosen labels. Then, successively choose a label from the labels and add this as a leaf with a split to the existing tree by attaching it to a random split, thereby dividing this split into two splits and one has to update all the other splits accordingly.

Parameters:

labels (list[int]) – A list of integers, the set of labels that we generate splits for.

Returns:

splits (list[Split]) – A list of splits of the set of labels, maximal number of splits.

geomstats.geometry.stratified.vectorization module#

Decorator to handle vectorization.

geomstats.geometry.stratified.vectorization.broadcast_lists(*lists)[source]#

Broadcast lists.

Similar behavior as gs.broadcast_arrays, but for lists.

Parameters:

*lists (list)

Returns:

*broadcasted_lists (list) – Lists with broadcasted length.

geomstats.geometry.stratified.vectorization.vectorize_point(*args_positions, manipulate_input=<function _manipulate_input>, manipulate_output=<function _manipulate_output>)[source]#

Check point type and transform in iterable if not the case.

Parameters:

args_positions (tuple) – Position and corresponding argument name. A tuple for each position.

Notes

Explicitly defining args_positions and args names ensures it works for all combinations of input calling.

geomstats.geometry.stratified.wald_space module#

Classes for the Wald Space and elements therein of class Wald and helper classes.

Lead author: Jonas Lueg

References

[Garba21]

Garba, M. K., T. M. W. Nye, J. Lueg and S. F. Huckemann. “Information geometry for phylogenetic trees” Journal of Mathematical Biology, 82(3):19, February 2021a. https://doi.org/10.1007/s00285-021-01553-x.

[Lueg21] (1,2)

Lueg, J., M. K. Garba, T. M. W. Nye, S. F. Huckemann. “Wald Space for Phylogenetic Trees.” Geometric Science of Information, Lecture Notes in Computer Science, pages 710–717, Cham, 2021. https://doi.org/10.1007/978-3-030-80209-7_76.

class geomstats.geometry.stratified.wald_space.BasicWaldGeodesicSolver(space)[source]#

Bases: ABC

Abstract class for wald geodesic solver.

Parameters:

space (WaldSpace)

discrete_geodesic(initial_point, end_point)[source]#

Compute a discrete geodesic in the WaldSpace.

Parameters:
  • initial_point (Wald or WaldBatch) – Point in the WaldSpace.

  • initial_point (Wald or WaldBatch) – Point in the WaldSpace.

Returns:

geod_points (WaldBatch or list[WaldBatch]) – Time parameterized geodesic curve.

geodesic(initial_point, end_point)[source]#

Compute the geodesic in the WaldSpace.

Parameters:
  • initial_point (Wald or WaldBatch) – Point in the WaldSpace.

  • initial_point (Wald or WaldBatch) – Point in the WaldSpace.

Returns:

path (callable) – Time parameterized geodesic curve.

class geomstats.geometry.stratified.wald_space.DiscreteWaldPath(space, path, interpolator=None, **interpolator_kwargs)[source]#

Bases: object

A uniformly-sampled discrete path.

Parameters:
  • space (WaldSpace)

  • path (WaldBatch) – Wald collection with common topology.

  • interpolator (Interpolator1D)

class geomstats.geometry.stratified.wald_space.LocalProjectionSolver(space, btol=1e-10)[source]#

Bases: object

Local projection solver.

projection(ambient_point, topology, initial_weights=None)[source]#

Project ambient point into wald space.

Parameters:
  • ambient_point (array-like, shape=[…, n_nodes, n_nodes]) – Ambient point to project.

  • topology (ForestTopology or list[ForestTopology]) – Stratum topology.

  • initial_weights (array-like, shape=[…, n_splits]) – Initial guess for weights.

class geomstats.geometry.stratified.wald_space.NaiveProjectionGeodesicSolver(space, n_grid=10)[source]#

Bases: BasicWaldGeodesicSolver

Naive geodesic projection solver.

Implementation of algorithm 1 from [Lueg21].

class geomstats.geometry.stratified.wald_space.SuccessiveProjectionGeodesicSolver(space, n_grid=10)[source]#

Bases: BasicWaldGeodesicSolver

Successive projection geodesic projection solver.

Implementation of algorithm 2 from [Lueg21].

class geomstats.geometry.stratified.wald_space.Wald(topology, weights)[source]#

Bases: Point

A class for wälder, that are phylogenetic forests, elements of the Wald Space.

A wald is essentially a phylogenetic forest with weights between zero and one on the edges. The forest structure is stored as a ForestTopology and the edge weights are an array of length that is equal to the total number of splits in the structure. These elements are the points in Wald space and other phylogenetic forest spaces, like BHV space, although the partition is just the whole set of labels in this case.

Parameters:
  • topology (ForestTopology) – The structure of the forest.

  • weights (array-like, shape=[n_splits]) – The edge weights, array of floats between 0 and 1, with m entries, where m is the total number of splits/edges in the structure top.

  • corr (array-like, shape=[n_labels, n_labels]) – Correlation matrix of the topology with edge weights.

equal(point, atol=1e-12)[source]#

Check equality against another point.

Parameters:
  • point (Wald or WaldBatch) – Point to compare against point.

  • atol (float)

Returns:

is_equal (array-like, shape=[…])

class geomstats.geometry.stratified.wald_space.WaldBatch(iterable=(), /)[source]#

Bases: PointBatch

Wald batch.

property corr#

Correlation matrix of the topology with edge weights.

Returns:

corr (array-like, shape=[n_points, n_nodes, n_nodes])

property topology#

Forest topology.

Returns:

topology (list[ForestTopology])

property weights#

Edge weights.

Returns:

weights (array-like, shape=[n_points, n_splits])

class geomstats.geometry.stratified.wald_space.WaldSpace(n_labels, ambient_space=None, equip=True)[source]#

Bases: PointSet

Class for the Wald space, a metric space for phylogenetic forests.

A topological space. Points in Wald space are instances of the class Wald: phylogenetic forests with edge weights between 0 and 1. In particular, Wald space is a stratified space, each stratum is called grove. The highest dimensional groves correspond to fully resolved or binary trees. The topology is obtained from embedding wälder into the ambient space of strictly positive \(n\times n\) symmetric matrices, implemented in the class spd.SPDMatrices.

Parameters:

n_labels (int) – Integer determining the number of labels in the forests, and thus the shape of the correlation matrices: n_labels x n_labels.

ambient_space#

The ambient space, the positive definite n_labels x n_labels matrices that the WaldSpace is embedded into.

Type:

Manifold

belongs(point, atol=1e-12)[source]#

Check if a point wald belongs to Wald space.

From FUTURE PUBLICATION we know that the corresponding matrix of a wald is strictly positive definite if and only if the labels are separated by at least one edge, which is exactly when the wald is an element of the Wald space.

Parameters:
  • point (Wald or WaldBatch) – The point to be checked.

  • atol (float) – Absolute tolerance. Optional, default: backend atol.

Returns:

belongs (array-like, shape=[…]) – Boolean denoting if point belongs to Wald space.

static default_metric()[source]#

Metric to equip the space with if equip is True.

lift(point)[source]#

Lift a point to the ambient space.

Parameters:

point (Wald or WaldBatch) – The point to be lifted.

Returns:

lifted_point (array-like, shape=[…, n_labels, n_labels]) – Point in the ambient space.

random_grove_point(topology, n_samples=1)[source]#

Sample a random point in a given grove of wald spcae.

Parameters:
  • topology (ForestTopology)

  • n_samples (int) – Number of samples. Defaults to 1.

Returns:

samples (Wald or WaldBatch) – Points sampled in Wald space.

random_point(n_samples=1, p_tree=0.9, p_keep=0.9, btol=1e-08)[source]#

Sample a random point in Wald space.

Parameters:
  • n_samples (int) – Number of samples. Defaults to 1.

  • p_tree (float between 0 and 1) – The probability that the sampled point is a tree, and not a forest. If the probability is equal to 1, then the sampled point will be a tree. Defaults to 0.9.

  • p_keep (float between 0 and 1) – The probability that a sampled edge is kept and not deleted randomly. To be precise, it is not exactly the probability, as some edges cannot be deleted since the requirement that two labels are separated by a split might be violated otherwise. Defaults to 0.9

  • btol (float) – Tolerance for the boundary of the coordinates in each grove. Defaults to 1e-08.

Returns:

samples (Wald or WaldBatch) – Points sampled in Wald space.

class geomstats.geometry.stratified.wald_space.WaldSpaceMetric(space, projection_solver=None, geodesic_solver=None)[source]#

Bases: PointSetMetric

Wald space metric.

Parameters:
  • space (WaldSpace) – Set to equip with metric.

  • projection_solver (ProjectionSolver) – Numerical solver to solve projection problem.

  • geodesic_solver (GeodesicSolver) – Numerical solver to solve geodesic problem.

discrete_geodesic(initial_point, end_point)[source]#

Compute a discrete geodesic in the WaldSpace.

Parameters:
  • initial_point (Wald or WaldBatch) – Point in the WaldSpace.

  • end_point (Wald or WaldBatch) – Point in the WaldSpace.

Returns:

geod_points (WaldBatch or list[WaldBatch]) – Time parameterized geodesic curve.

dist(point_a, point_b)[source]#

Distance between two points in the WaldSpace.

Parameters:
  • point_a (Wald or WaldBatch) – Point in the WaldSpace.

  • point_b (Wald or WaldBatch) – Point in the WaldSpace.

Returns:

distance (array-like, shape=[…]) – Distance.

geodesic(initial_point, end_point)[source]#

Compute the geodesic in the WaldSpace.

Parameters:
  • initial_point (Wald or WaldBatch) – Point in the WaldSpace.

  • end_point (Wald or WaldBatch) – Point in the WaldSpace.

Returns:

path (callable) – Time parameterized geodesic curve.

projection(ambient_point, topology, initial_weights=None)[source]#

Projects a point into Wald space.

Parameters:
  • ambient_point (array-like, shape=[…, n_nodes, n_nodes]) – Ambient point to project.

  • topology (ForestTopology or list[ForestTopology]) – Stratum topology.

  • initial_weights (array-like, shape=[…, n_splits]) – Initial guess for weights.

geomstats.geometry.stratified.wald_space.generate_random_wald(n_labels, p_keep, p_new, btol=1e-08, check=True)[source]#

Generate a random instance of class Wald.

Parameters:
  • n_labels (int) – The number of labels the wald is generated with respect to.

  • p_keep (float) – The probability will be inserted into the generation of a partition as well as for the generation of a split set for the topology of the wald.

  • p_new (float) – A float between 0 and 1, the probability that no new component is added, and probability of 1 - p_new_ that a new component is added.

  • btol (float) – Tolerance for the boundary of the coordinates in each grove. Defaults to 1e-08.

  • check (bool) – If True, checks if splits still separate all labels. In this case, the split will not be deleted. If False, any split can be randomly deleted.

Returns:

random_wald (Wald) – The randomly generated wald.

geomstats.geometry.stratified.wald_space.make_splits(n_labels)[source]#

Generate all possible splits of a collection.

Parameters:

n_labels (int) – Number of labels.

Returns:

split_iterator (Iterator[Split])

geomstats.geometry.stratified.wald_space.make_topologies(n_labels)[source]#

Generate all possible sets of compatible splits of a collection.

This only works well for len(n_labels) < 8.

Parameters:

n_labels (int) – Number of labels.

Returns:

topology_iterator (Iterator[ForestTopology])

Module contents#

The Stratified Space Geometry Package.