geomstats.geometry.stratified package#
Submodules#
geomstats.geometry.stratified.bhv_space module#
Class for the BHV Tree Space.
Lead author: Jonas Lueg
References
Billera, L. J., S. P. Holmes, K. Vogtmann. “Geometry of the Space of Phylogenetic Trees.” Advances in Applied Mathematics, volume 27, issue 4, pages 733-767, 2001. https://doi.org/10.1006%2Faama.2001.0759
Owen, M., J. S. Provan. “A Fast Algorithm for Computing Geodesic Distances in Tree Space.” IEEE/ACM Transactions on Computational Biology and Bioinformatics, volume 8, issue 1, pages 2-13, 2011. https://doi.org/10.1109/TCBB.2010.3
- class geomstats.geometry.stratified.bhv_space.BHVMetric(space)[source]#
Bases:
PointSetMetric
BHV metric for Tree Space for phylogenetic trees.
The BHV Tree Space as it is introduced in [BHV01], a metric space that is CAT(0), and there exist unique geodesics between each pair of points in the BHV Space. The polynomial time algorithm for computing the distance and geodesic between two points is implemented, following the definitions and results of [OP11]. There, computing the geodesic between two trees is called the ‘Geodesic Tree Path’ problem, and that is why some methods below (not visible to the user though) start with the letters ‘gtp’.
- Parameters:
total_space (TreeSpace) – Set with quotient structure.
- dist(point_a, point_b)[source]#
Compute the distance between two points.
- Parameters:
point_a (Tree or TreeBatch) – A point in BHV Space.
point_b (Tree or TreeBatch) – A point in BHV Space.
- Returns:
dist (array-like, shape=[…]) – The distance between the two points.
- geodesic(initial_point, end_point)[source]#
Compute the geodesic between two points.
- Parameters:
initial_point (Tree or TreeBatch) – A point in BHV Space.
end_point (Tree or TreeBatch) – A point in BHV Space.
- Returns:
geodesic (callable) – The geodesic between the two points. Takes parameter t, that is the time between 0 and 1 at which the corresponding point on the path is returned.
- class geomstats.geometry.stratified.bhv_space.GTPSolver(n_labels, tol=1e-08)[source]#
Bases:
object
‘Geodesic Tree Path’ problem solver [OP11].
Essentially uses Theorem 2.4 from [OP11].
- Parameters:
tol (float) – Tolerance for the algorithm, in particular for the decision problem in the GTP algorithm in [OP11] to avoid unambiguity.
- dist(point_a, point_b)[source]#
Compute the distance between two points.
- Parameters:
point_a (Tree or TreeBatch) – A point in BHV Space.
point_b (Tree or TreeBatch) – A point in BHV Space.
- Returns:
dist (array-like, shape=[…]) – The distance between the two points.
- geodesic(initial_point, end_point)[source]#
Compute the geodesic between two points.
- Parameters:
initial_point (Tree or TreeBatch) – A point in BHV Space.
end_point (Tree or TreeBatch) – A point in BHV Space.
- Returns:
geodesic (callable) – The geodesic between the two points. Takes parameter t, that is the time between 0 and 1 at which the corresponding point on the path is returned.
- class geomstats.geometry.stratified.bhv_space.Tree(splits, lengths)[source]#
Bases:
Point
A class for trees, that are phylogenetic trees, elements of the BHV space.
A tree is essentially a phylogenetic tree with edges having length greater than zero. The representation of the tree is via splits, the edge lengths are stored in a vector.
- Parameters:
splits (list[Split]) – The structure of the tree in form of a set of splits of the set of labels.
lengths (array-like, shape=[n_splits]) – The edge lengths of the splits, a vector containing positive numbers.
- topology#
The topology of the tree.
- Type:
- lengths#
The edge lengths of the splits, a vector containing positive numbers.
- Type:
array-like, shape=[n_splits]
- class geomstats.geometry.stratified.bhv_space.TreeBatch(iterable=(), /)[source]#
Bases:
PointBatch
Tree batch.
- property lengths#
Edge lengths.
- Returns:
lengths (array-like, shape=[n_points, n_splits])
- property topology#
Tree topology.
- Returns:
topology (list[TreeTopology])
- class geomstats.geometry.stratified.bhv_space.TreeSpace(n_labels, equip=True)[source]#
Bases:
PointSet
Class for the Tree space, a point set containing phylogenetic trees.
A topological space. Points in Tree space are instances of the class
Tree
: phylogenetic trees with edge lengths between 0 and infinity. For the space of trees see also [BHV01].- Parameters:
n_labels (int) – The number of labels in the trees.
- belongs(point, atol=1e-12)[source]#
Check if a point belongs to Tree space.
- Parameters:
point (Tree or TreeBatch) – The point to be checked.
atol (float) – Absolute tolerance. Optional, default: backend atol.
- Returns:
belongs (array-like, shape=[…]) – Boolean denoting if point belongs to Tree space.
- random_point(n_samples=1, p_keep=0.9, btol=1e-08)[source]#
Sample a random point in Tree space.
- Parameters:
n_samples (int) – Number of samples. Defaults to 1.
p_keep (float between 0 and 1) – The probability that a sampled edge is kept and not deleted randomly. To be precise, it is not exactly the probability, as some edges cannot be deleted since the requirement that two labels are separated by a split might be violated otherwise. Defaults to 0.9
btol (float) – Tolerance for the boundary of the edge lengths. Defaults to 1e-08.
- Returns:
samples (Tree or TreeBatch) – Points sampled in Tree space.
- class geomstats.geometry.stratified.bhv_space.TreeTopology(splits)[source]#
Bases:
ForestTopology
The topology of a tree, using a split-based representation.
- Parameters:
splits (list[Split]) – The structure of the tree in form of a set of splits of the set of labels.
- n_labels#
Number of labels, the set of labels is then \(\{0,\dots,n-1\}\).
- Type:
int
- where#
Give the index of a split in the flattened list of all splits.
- Type:
dict
- sep#
An increasing list of numbers between 0 and m, where m is the total number of splits in
self.split_sets
, starting with 0, where each number indicates that a new connected component starts at that index. Useful for example for unraveling the tuple of all splits intoself.split_sets
.- Type:
list of int
- paths#
A list of dictionaries, each dictionary is for the respective connected component of the forest, and the items of each dictionary are for each pair of labels u, v, u < v in the respective component, a list of the splits on the unique path between the labels u and v.
- Type:
list of dict
- support#
For each split, give an \(n\times n\) dimensional matrix, where the uv-th entry is
True
if the split separates the labels u and v, elseFalse
.- Type:
list of array-like
- property labels#
Node labels.
- Returns:
labels (tuple[int]) – Node labels.
- property splits#
Splits.
- Returns:
splits (list[Split]) – The structure of the tree in form of a set of splits of the set of labels.
- geomstats.geometry.stratified.bhv_space.generate_random_tree(n_labels, p_keep=0.9, btol=1e-08)[source]#
Generate a random instance of
Tree
.- Parameters:
p_keep (float between 0 and 1) – The probability that a sampled edge is kept and not deleted randomly. To be precise, it is not exactly the probability, as some edges cannot be deleted since the requirement that two labels are separated by a split might be violated otherwise. Defaults to 0.9
btol (float) – Tolerance for the boundary of the edge lengths. Defaults to 1e-08.
geomstats.geometry.stratified.graph_space module#
Graph Space.
Lead author: Anna Calissano.
References
Calissano, A., Feragen, A., Vantini, S. “Graph Space: Geodesic Principal Components for a Population of Network-valued Data.” Mox report 14, 2020. https://mox.polimi.it/reports-and-theses/publication-results/?id=855.
- class geomstats.geometry.stratified.graph_space.ExhaustiveAligner(total_space)[source]#
Bases:
GraphSpaceAlignerAlgorithm
Brute force exact alignment.
Exact Alignment obtained by exploring the whole permutation group.
- Parameters:
total_space (GraphSpace) – Set with quotient structure.
Notes
Not recommended for large n_nodes.
- class geomstats.geometry.stratified.graph_space.FAQAligner(total_space)[source]#
Bases:
GraphSpaceAlignerAlgorithm
Fast Quadratic Assignment for graph matching (or network alignment).
References
[Vogelstein2015]Vogelstein JT, Conroy JM, Lyzinski V, Podrazik LJ, Kratzer SG, Harley ET, Fishkind DE, Vogelstein RJ, Priebe CE. “Fast approximate quadratic programming for graph matching.“ PLoS One. 2015 Apr 17; doi: 10.1371/journal.pone.0121002.
- class geomstats.geometry.stratified.graph_space.GraphSpace(n_nodes, equip=True)[source]#
Bases:
Matrices
Class for the Graph Space.
Graph Space to analyse populations of labelled and unlabelled graphs. The space focuses on graphs with scalar euclidean attributes on nodes and edges, with a finite number of nodes and both directed and undirected edges. For undirected graphs, use symmetric adjacency matrices. The space is a quotient space obtained by applying the permutation action of nodes to the space of adjacency matrices. Notice that for computation reasons the module works with both the gs.array representation of graph and the GraphPoint representation.
Points are represented by \(nodes \times nodes\) adjacency matrices. Both the array input and the Graph Point type input work.
- Parameters:
n_nodes (int) – Number of graph nodes
total_space (space) – Total Space before applying the permutation action. Default: Euclidean Space.
References
[Calissano2020]Calissano, A., Feragen, A., Vantini, S. “Graph Space: Geodesic Principal Components for a Population of Network-valued Data.” Mox report 14, 2020. https://mox.polimi.it/reports-and-theses/publication-results/?id=855.
[Jain2009]Jain, B., Obermayer, K. “Structure Spaces.” Journal of Machine Learning Research, 10(11), 2009. https://www.jmlr.org/papers/volume10/jain09a/jain09a.pdf
- class geomstats.geometry.stratified.graph_space.GraphSpaceAligner(total_space, align_algo=None)[source]#
Bases:
Aligner
Graph space aligner.
- Parameters:
total_space (GraphSpace) – Set with quotient structure.
align_algo (GraphSpaceAlignerAlgorithm) – Algorihtm performing alignment.
- MAP_ALIGNER = {'FAQ': <class 'geomstats.geometry.stratified.graph_space.FAQAligner'>, 'exhaustive': <class 'geomstats.geometry.stratified.graph_space.ExhaustiveAligner'>}#
- align_point_to_geodesic(geodesic, point)[source]#
Align point to a geodesic.
Using the selected alignment technique, it returns the aligned point as optimally aligned to the geodesic.
- Parameters:
geodesic (function) – Geodesic.
point (array-like, shape=[…, n_nodes, n_nodes]) – Graph to align.
- Returns:
permuted_graph (list, shape = […, n_nodes, n_nodes])
- property perm_#
Optimal node permutations.
- Returns:
perm_ (array-like, shape=[…, n_nodes]) – Node permutations where in position i we have the value j meaning
the node i should be permuted with node j.
- set_alignment_algorithm(align_algo='FAQ', **kwargs)[source]#
Set the aligning strategy.
GraphSpace metric relies on alignment. In this module we propose the the FAQ graph matching by [Vogelstein2015], and exhaustive aligner which explores the whole permutation group.
- Parameters:
align_algo (str or GraphSpaceAlignerAlgorithm) – ‘FAQ’: Fast Quadratic Assignment - only compatible with Frobenius norm, ‘exhaustive’: all group exhaustive search
- set_point_to_geodesic_aligner(aligner='default', **kwargs)[source]#
Set the alignment between a point and a geodesic.
Following the geodesic to point alignment in [Calissano2020] and [Huckemann2010], this function defines the parameters [s_min, s_max] and the number of points to sample in the domain.
- Parameters:
aligner (BasePointToGeodesicAligner)
s_min (float) – Minimum value of the domain to sample along the geodesics.
s_max (float) – Minimum value of the domain to sample along the geodesics.
n_points (int) – Number of points to sample between s_min and s_max.
- class geomstats.geometry.stratified.graph_space.GraphSpaceAlignerAlgorithm(total_space)[source]#
Bases:
AlignerAlgorithm
,ABC
Base class for graph space numerical aligner.
- total_space#
Set with quotient structure.
- Type:
- perm_#
Node permutations where in position i we have the value j meaning the node i should be permuted with node j.
- Type:
array-like, shape=[…, n_nodes]
- class geomstats.geometry.stratified.graph_space.GraphSpaceQuotientMetric(space, total_space)[source]#
Bases:
QuotientMetric
Class for the Graph Space Metric.
Every metric \(d: X \times X \rightarrow \mathbb{R}\) on the total space of adjacency matrices can descend onto the quotient space as a pseudo-metric: \(d([x_1],[x_2]) = min_{t\in T} d_X(x_1, t^Tx_2t)\). The metric relies on the total space metric and an alignment procedure, i.e., Graph Matching or Networks alignment procedure. Metric, alignment, geodesics, and alignment with respect to a geodesic are defined. By default, the alignment is FAQ and the total space metric is the Frobenious norm.
- Parameters:
space (GraphSpace) – GraphSpace object.
References
[Calissano2020]Calissano, A., Feragen, A., Vantini, S. “Graph Space: Geodesic Principal Components for a Population of Network-valued Data.” Mox report 14, 2020. https://mox.polimi.it/reports-and-theses/publication-results/?id=855.
[Jain2009]Jain, B., Obermayer, K. “Structure Spaces.” Journal of Machine Learning Research 10.11 (2009). https://www.jmlr.org/papers/v10/jain09a.html.
- class geomstats.geometry.stratified.graph_space.PointToGeodesicAligner(total_space, s_min=0.0, s_max=1.0, n_grid=10)[source]#
Bases:
PointToGeodesicAlignerBase
Class for the alignment of the points with respect to a geodesic.
Implementing the algorithm in [Huckemann2010] to select an optimal alignment to a point with respect to a geodesic. The algorithm sample discrete set of n_points along the geodesic between [s_min, s_max] and find the permutation that gets closer to the datapoints along the geodesic.
- Parameters:
total_space (GraphSpace) – Set with quotient structure.
s_min (float) – Minimum value of the domain to sample along the geodesics.
s_max (float) – Minimum value of the domain to sample along the geodesics.
n_grid (int) – Number of points to sample between s_min and s_max.
References
[Calissano2020]Calissano, A., Feragen, A., Vantini, S. “Graph Space: Geodesic Principal Components for a Population of Network-valued Data.” Mox report 14, 2020. https://mox.polimi.it/reports-and-theses/publication-results/?id=855.
- align(geodesic, point)[source]#
Align the graph to the geodesic.
- Parameters:
geodesic (function) – Geodesic function in GraphSpace.
point (array-like, shape=[…, n_nodes, n_nodes]) – Graph to align.
- Returns:
aligned_point (array-like, shape=[…, n_nodes, n_nodes]) – Permuted graph as to be aligned with respect to the geodesic.
- dist(geodesic, point)[source]#
Compute the distance between the geodesic and the point.
- Parameters:
geodesic (function) – Geodesic function in GraphSpace.
point (array-like, shape=[…, n_nodes, n_nodes]) – Graph to align.
- Returns:
dist (array-like, shape=[…, n_nodes]) – Distance between the point and the geodesic.
Notes
Due to the discrete nature of the method, distance is not very accurate.
- property s_grid#
Save the domain discretization.
- class geomstats.geometry.stratified.graph_space.PointToGeodesicAlignerBase(total_space)[source]#
Bases:
ABC
Base class for point to geodesic aligner.
- Parameters:
total_space (GraphSpace) – Set with quotient structure.
- perm_#
Node permutations where in position i we have the value j meaning the node i should be permuted with node j.
- Type:
array-like, shape=[…, n_nodes]
geomstats.geometry.stratified.point_set module#
Class for Stratified Spaces.
Lead authors: Anna Calissano and Jonas Lueg
- class geomstats.geometry.stratified.point_set.PointBatch(iterable=(), /)[source]#
Bases:
ABC
,list
Class for point batch.
- class geomstats.geometry.stratified.point_set.PointSet(equip=True)[source]#
Bases:
ABC
Class for a set of points of type Point.
- abstract belongs(point, atol=1e-12)[source]#
Evaluate if a point belongs to the set.
- Parameters:
point (Point or PointBatch) – Point to evaluate.
atol (float) – Absolute tolerance. Optional, default: backend atol.
- Returns:
belongs (array-like, shape=[…]) – Boolean evaluating if point belongs to the set.
- class geomstats.geometry.stratified.point_set.PointSetMetric(space)[source]#
Bases:
ABC
Class for the lenght spaces.
- Parameters:
space (PointSet) – Set to equip with metric.
geomstats.geometry.stratified.quotient module#
Quotient structure for a geodesic metric space.
- class geomstats.geometry.stratified.quotient.Aligner(total_space, align_algo=None)[source]#
Bases:
ABC
Bundle structure.
- Parameters:
total_space (PointSet) – Set with quotient structure.
align_algo (AlignerAlgorithm) – Algorihtm performing alignment.
- class geomstats.geometry.stratified.quotient.QuotientMetric(space, total_space)[source]#
Bases:
PointSetMetric
,ABC
Quotient metric.
- Parameters:
space (PointSet) – Set to equip with metric.
total_space (PointSet) – Set with quotient structure.
- dist(point_a, point_b)[source]#
Compute distance between two points.
- Parameters:
point_a (array-like, shape=[…, *point_shape])
point_b (array-like, shape=[…, *point_shape])
- Returns:
distance (array-like, shape=[…]) – Distance between the points.
geomstats.geometry.stratified.spider module#
Class for the spider.
Lead authors: Anna Calissano & Jonas Lueg
- class geomstats.geometry.stratified.spider.Spider(n_rays, equip=True)[source]#
Bases:
PointSet
Spider: a set of rays attached to the origin.
The k-spider consists of k copies of the positive real line \(\mathbb{R}_{\geq 0}\) glued together at the origin [Feragen2020].
- Parameters:
n_rays (int) – Number of rays to attach to the origin.
References
[Feragen2020]Feragen, Aasa, and Tom Nye. “Statistics on stratified spaces.” Riemannian Geometric Statistics in Medical Image Analysis. Academic Press, 2020. 299-342.
- class geomstats.geometry.stratified.spider.SpiderMetric(space)[source]#
Bases:
PointSetMetric
Geometry on the Spider, induced by the rays metric.
- dist(point_a, point_b)[source]#
Compute the distance between two points on the Spider using the ray geometry.
The spider metric is the metric in each ray extended to the Spider: given two points x, y on different rays, d(x, y) = d(x, 0) + d(0, y).
- Parameters:
point_a (SpiderPoint or PointBatch) – Point in the Spider.
point_b (SpiderPoint or PointBatch) – Point in the Spider.
- Returns:
dist (array-like, shape=[…]) – Distance between points.
- geodesic(initial_point, end_point)[source]#
Return the geodesic between two lists of Spider points.
- Parameters:
initial_point (SpiderPoint or PointBatch) – Point in the Spider.
end_point (SpiderPoint or PointBatch) – Point in the Spider.
- Returns:
path (callable) – Return a vectorized geodesic function.
- class geomstats.geometry.stratified.spider.SpiderPoint(stratum, coord)[source]#
Bases:
Point
Class for points of the Spider.
A point in the Spider is \((s,c) \in \mathbb{N} \times \mathbb{R}\).
- Parameters:
stratum (int) – The stratum, an integer indicating the stratum the point lies in. If zero, then the point is on the origin.
coord (array-like, shape=[1,]) – A positive number, the coordinate of the point. It must be zero if and only if the stratum is zero, i.e. the origin.
geomstats.geometry.stratified.trees module#
Helper classes for tree spaces.
Class Split
.
Essentially, a Split
is a two-set partition of a subset of \(\{0,\dots,n-1\}\).
This class is designed such that one part of both parts of the partition can be empty.
Splits are corresponding uniquely to edges in a phylogenetic forest, where, if one cuts
the edge in the forest, the resulting two-set partition of the labels of the respective
component of the forest is the corresponding split.
Lead author: Jonas Lueg
- class geomstats.geometry.stratified.trees.ForestTopology(partition, split_sets)[source]#
Bases:
object
The topology of a forest, using a split-based graph-structure representation.
A forest topology is a partition into non-empty sets of the set \(\{0,\dots,n-1\}\), together with a set of splits for each element of the partition, where every split is a two-set partition of the respective element. A structure basically describes a phylogenetic forest, where each set of splits gives the structure of the tree with the labels of the corresponding element of the partition.
- Parameters:
partition (tuple) – A tuple of tuples that is a partition of the set \(\{0,\dots,n-1\}\), representing the label sets of each connected component of the forest topology.
split_sets (tuple) – A tuple of tuples containing splits, where each set of splits contains only splits of the respective label set in the partition, so their order is related. The splits are the edges of the connected components of the forest, respectively, and thus the union of all sets of splits yields all edges of the forest topology.
- n_labels#
Number of labels, the set of labels is then \(\{0,\dots,n-1\}\).
- Type:
int
- n_splits#
Number of splits.
- Type:
int
- where#
Give the index of a split in the flattened list of all splits.
- Type:
dict
- sep#
An increasing list of numbers between 0 and m, where m is the total number of splits in
self.split_sets
, starting with 0, where each number indicates that a new connected component starts at that index. Useful for example for unraveling the tuple of all splits intoself.split_sets
.- Type:
list of int
- paths#
A list of dictionaries, each dictionary is for the respective connected component of the forest, and the items of each dictionary are for each pair of labels u, v, u < v in the respective component, a list of the splits on the unique path between the labels u and v.
- Type:
list of dict
- support#
For each split, give an \(n\times n\) dimensional matrix, where the uv-th entry is
True
if the split separates the labels u and v, elseFalse
.- Type:
list of array-like
- corr(weights)[source]#
Compute the correlation matrix of the topology with edge weights
weights
.- Parameters:
weights (array-like, [n_splits]) – Edge weights.
- Returns:
corr (array-like, shape=[n, n]) – Returns the corresponding correlation matrix.
- corr_gradient(weights)[source]#
Compute the gradient of the correlation matrix, differentiated by weights.
- Parameters:
weights (array-like, [n_splits]) – The vector weights at which the gradient is computed.
- Returns:
gradient (array-like, shape=[n_splits, n, n]) – The gradient of the correlation matrix, differentiated by weights.
- class geomstats.geometry.stratified.trees.Split(part1, part2)[source]#
Bases:
object
Two-set partitions of sets.
Two-set partitions of a smaller subset of \(\{0,...,n-1\}\) are also allowed, where \(n\) is a positive integer, which is not passed as an argument as it is nowhere needed.
The parameters
part1
andpart2
are assigned to the attributesself.part1
andself.part2
in a unique sorted way: the one that contains the smallest minimal value is assigned toself.part1
for consistency.- Parameters:
part1 (iterable) – The first part of the split, an iterable that is a subset of \(\{0,\dots,n-1\}\). It may be empty, but must have empty intersection with
part2
.part2 (iterable) – The second part of the split, an iterable that is a subset of \(\{0,\dots,n-1\}\). It may be empty, but must have empty intersection with
part1
.
- get_part_away_from(other)[source]#
Return the part of this split that is directed away from other split.
- Parameters:
other (Split) – The other split.
- Returns:
part_that_does_not_point (iterable) – Return the part of the split
self
that does not point towardother
. Seeself.get_part_towards
for further explanation.
- get_part_towards(other)[source]#
Return the part of this split that is directed toward the other split.
Each split contains part1 and part2, the parts that the corresponding edge in the graph splits the set of labels into. Thus, one can think of the split as an edge, where part1 points in the direction of the part of the tree where the labels of part1 are contained, and part2 points in the other direction. So, part1 points in the direction of
other
, if it corresponds to an edge that is contained in the tree that part1 points to, else part2 points in the direction ofsplit
.- Parameters:
other (Split) – The other split.
- Returns:
part_towards (iterable) – Return the part of the split
self
that points towardother_split
.
- is_compatible(other)[source]#
Check whether this split is compatible with another split.
Two splits are compatible, if at least one intersection of the respective parts of the splits is empty.
- Parameters:
other (Split) – The other split.
- Returns:
is_compatible_with (bool) – Return
True
if the splits are compatible, elseFalse
.
- part_contains(subset)[source]#
Determine if a subset is contained in either part of a split.
- Parameters:
subset (set) – The subset containing labels.
- Returns:
is_contained (bool) – A boolean that is true if the subset is contained in
self.part1
orself.part2
.
- restrict_to(subset)[source]#
Return the restriction of a split to a subset.
- Parameters:
subset (set) – The subset that the split is restricted to.
- Returns:
restr_split (Split) – The restricted split, if the split is \(A\vert B\), then the split restricted to the subset \(C\) is \(A\cap C\vert B\cap C\).
- separates(u, v)[source]#
Determine whether the labels (or label sets) are separated by the split.
- Parameters:
u (list of int, int) – Either an integer or a set of labels.
v (list of int, int) – Either an integer or a set of labels.
- Returns:
is_separated (bool) –
- A boolean determining whether u and v are separated by the split (i.e. if
they are not in the same part).
- geomstats.geometry.stratified.trees.check_if_separated(labels, splits)[source]#
Check for each pair of labels if exists split that separates them.
- Parameters:
labels (list[int]) – A list of integers, the set of labels that we generate splits for.
splits (list[Split]) – A list of splits of the set of labels.
- Returns:
are_separated (bool) – True if the labels are pair-wise separated by a split else False.
- geomstats.geometry.stratified.trees.delete_splits(splits, labels, p_keep, check=True)[source]#
Delete splits randomly from a set of splits.
We require the splits to satisfy the check for if all pair-wise labels are separated. In this way, before deleting a split, this condition is checked to make sure it is not violated.
- Parameters:
splits (list[Split]) – A list of splits of the set of labels.
labels (list[int]) – A list of integers, the set of labels that we generate splits for.
p_keep (float) – A float between 0 and 1 determining the probability with which a split is kept and not deleted.
check (bool) – If True, checks if splits still separate all labels. In this case, the split will not be deleted. If False, any split can be randomly deleted.
- Returns:
left_over_splits (list[Split]) – The list of splits that are not deleted.
- geomstats.geometry.stratified.trees.generate_splits(labels)[source]#
Generate random maximal set of compatible splits of set
labels
.This method works inductively on the number of elements in labels. Start with a split of two randomly chosen labels. Then, successively choose a label from the labels and add this as a leaf with a split to the existing tree by attaching it to a random split, thereby dividing this split into two splits and one has to update all the other splits accordingly.
- Parameters:
labels (list[int]) – A list of integers, the set of labels that we generate splits for.
- Returns:
splits (list[Split]) – A list of splits of the set of labels, maximal number of splits.
geomstats.geometry.stratified.vectorization module#
Decorator to handle vectorization.
- geomstats.geometry.stratified.vectorization.broadcast_lists(*lists)[source]#
Broadcast lists.
Similar behavior as
gs.broadcast_arrays
, but for lists.- Parameters:
*lists (list)
- Returns:
*broadcasted_lists (list) – Lists with broadcasted length.
- geomstats.geometry.stratified.vectorization.vectorize_point(*args_positions, manipulate_input=<function _manipulate_input>, manipulate_output=<function _manipulate_output>)[source]#
Check point type and transform in iterable if not the case.
- Parameters:
args_positions (tuple) – Position and corresponding argument name. A tuple for each position.
Notes
Explicitly defining args_positions and args names ensures it works for all combinations of input calling.
geomstats.geometry.stratified.wald_space module#
Classes for the Wald Space and elements therein of class Wald and helper classes.
Lead author: Jonas Lueg
References
Garba, M. K., T. M. W. Nye, J. Lueg and S. F. Huckemann. “Information geometry for phylogenetic trees” Journal of Mathematical Biology, 82(3):19, February 2021a. https://doi.org/10.1007/s00285-021-01553-x.
Lueg, J., M. K. Garba, T. M. W. Nye, S. F. Huckemann. “Wald Space for Phylogenetic Trees.” Geometric Science of Information, Lecture Notes in Computer Science, pages 710–717, Cham, 2021. https://doi.org/10.1007/978-3-030-80209-7_76.
- class geomstats.geometry.stratified.wald_space.BasicWaldGeodesicSolver(space)[source]#
Bases:
ABC
Abstract class for wald geodesic solver.
- Parameters:
space (WaldSpace)
- discrete_geodesic(initial_point, end_point)[source]#
Compute a discrete geodesic in the WaldSpace.
- Parameters:
initial_point (Wald or WaldBatch) – Point in the WaldSpace.
initial_point (Wald or WaldBatch) – Point in the WaldSpace.
- Returns:
geod_points (WaldBatch or list[WaldBatch]) – Time parameterized geodesic curve.
- class geomstats.geometry.stratified.wald_space.DiscreteWaldPath(space, path, interpolator=None, **interpolator_kwargs)[source]#
Bases:
object
A uniformly-sampled discrete path.
- Parameters:
space (WaldSpace)
path (WaldBatch) – Wald collection with common topology.
interpolator (Interpolator1D)
- class geomstats.geometry.stratified.wald_space.LocalProjectionSolver(space, btol=1e-10)[source]#
Bases:
object
Local projection solver.
- projection(ambient_point, topology, initial_weights=None)[source]#
Project ambient point into wald space.
- Parameters:
ambient_point (array-like, shape=[…, n_nodes, n_nodes]) – Ambient point to project.
topology (ForestTopology or list[ForestTopology]) – Stratum topology.
initial_weights (array-like, shape=[…, n_splits]) – Initial guess for weights.
- class geomstats.geometry.stratified.wald_space.NaiveProjectionGeodesicSolver(space, n_grid=10)[source]#
Bases:
BasicWaldGeodesicSolver
Naive geodesic projection solver.
Implementation of algorithm 1 from [Lueg21].
- class geomstats.geometry.stratified.wald_space.SuccessiveProjectionGeodesicSolver(space, n_grid=10)[source]#
Bases:
BasicWaldGeodesicSolver
Successive projection geodesic projection solver.
Implementation of algorithm 2 from [Lueg21].
- class geomstats.geometry.stratified.wald_space.Wald(topology, weights)[source]#
Bases:
Point
A class for wälder, that are phylogenetic forests, elements of the Wald Space.
A wald is essentially a phylogenetic forest with weights between zero and one on the edges. The forest structure is stored as a
ForestTopology
and the edge weights are an array of length that is equal to the total number of splits in the structure. These elements are the points in Wald space and other phylogenetic forest spaces, like BHV space, although the partition is just the whole set of labels in this case.- Parameters:
topology (ForestTopology) – The structure of the forest.
weights (array-like, shape=[n_splits]) – The edge weights, array of floats between 0 and 1, with m entries, where m is the total number of splits/edges in the structure
top
.corr (array-like, shape=[n_labels, n_labels]) – Correlation matrix of the topology with edge weights.
- class geomstats.geometry.stratified.wald_space.WaldBatch(iterable=(), /)[source]#
Bases:
PointBatch
Wald batch.
- property corr#
Correlation matrix of the topology with edge weights.
- Returns:
corr (array-like, shape=[n_points, n_nodes, n_nodes])
- property topology#
Forest topology.
- Returns:
topology (list[ForestTopology])
- property weights#
Edge weights.
- Returns:
weights (array-like, shape=[n_points, n_splits])
- class geomstats.geometry.stratified.wald_space.WaldSpace(n_labels, ambient_space=None, equip=True)[source]#
Bases:
PointSet
Class for the Wald space, a metric space for phylogenetic forests.
A topological space. Points in Wald space are instances of the class
Wald
: phylogenetic forests with edge weights between 0 and 1. In particular, Wald space is a stratified space, each stratum is called grove. The highest dimensional groves correspond to fully resolved or binary trees. The topology is obtained from embedding wälder into the ambient space of strictly positive \(n\times n\) symmetric matrices, implemented in the classspd.SPDMatrices
.- Parameters:
n_labels (int) – Integer determining the number of labels in the forests, and thus the shape of the correlation matrices: n_labels x n_labels.
- ambient_space#
The ambient space, the positive definite n_labels x n_labels matrices that the WaldSpace is embedded into.
- Type:
- belongs(point, atol=1e-12)[source]#
Check if a point wald belongs to Wald space.
From FUTURE PUBLICATION we know that the corresponding matrix of a wald is strictly positive definite if and only if the labels are separated by at least one edge, which is exactly when the wald is an element of the Wald space.
- Parameters:
point (Wald or WaldBatch) – The point to be checked.
atol (float) – Absolute tolerance. Optional, default: backend atol.
- Returns:
belongs (array-like, shape=[…]) – Boolean denoting if point belongs to Wald space.
- lift(point)[source]#
Lift a point to the ambient space.
- Parameters:
point (Wald or WaldBatch) – The point to be lifted.
- Returns:
lifted_point (array-like, shape=[…, n_labels, n_labels]) – Point in the ambient space.
- random_grove_point(topology, n_samples=1)[source]#
Sample a random point in a given grove of wald spcae.
- Parameters:
topology (ForestTopology)
n_samples (int) – Number of samples. Defaults to 1.
- Returns:
samples (Wald or WaldBatch) – Points sampled in Wald space.
- random_point(n_samples=1, p_tree=0.9, p_keep=0.9, btol=1e-08)[source]#
Sample a random point in Wald space.
- Parameters:
n_samples (int) – Number of samples. Defaults to 1.
p_tree (float between 0 and 1) – The probability that the sampled point is a tree, and not a forest. If the probability is equal to 1, then the sampled point will be a tree. Defaults to 0.9.
p_keep (float between 0 and 1) – The probability that a sampled edge is kept and not deleted randomly. To be precise, it is not exactly the probability, as some edges cannot be deleted since the requirement that two labels are separated by a split might be violated otherwise. Defaults to 0.9
btol (float) – Tolerance for the boundary of the coordinates in each grove. Defaults to 1e-08.
- Returns:
samples (Wald or WaldBatch) – Points sampled in Wald space.
- class geomstats.geometry.stratified.wald_space.WaldSpaceMetric(space, projection_solver=None, geodesic_solver=None)[source]#
Bases:
PointSetMetric
Wald space metric.
- Parameters:
space (WaldSpace) – Set to equip with metric.
projection_solver (ProjectionSolver) – Numerical solver to solve projection problem.
geodesic_solver (GeodesicSolver) – Numerical solver to solve geodesic problem.
- discrete_geodesic(initial_point, end_point)[source]#
Compute a discrete geodesic in the WaldSpace.
- Parameters:
initial_point (Wald or WaldBatch) – Point in the WaldSpace.
end_point (Wald or WaldBatch) – Point in the WaldSpace.
- Returns:
geod_points (WaldBatch or list[WaldBatch]) – Time parameterized geodesic curve.
- dist(point_a, point_b)[source]#
Distance between two points in the WaldSpace.
- Parameters:
point_a (Wald or WaldBatch) – Point in the WaldSpace.
point_b (Wald or WaldBatch) – Point in the WaldSpace.
- Returns:
distance (array-like, shape=[…]) – Distance.
- geodesic(initial_point, end_point)[source]#
Compute the geodesic in the WaldSpace.
- Parameters:
initial_point (Wald or WaldBatch) – Point in the WaldSpace.
end_point (Wald or WaldBatch) – Point in the WaldSpace.
- Returns:
path (callable) – Time parameterized geodesic curve.
- projection(ambient_point, topology, initial_weights=None)[source]#
Projects a point into Wald space.
- Parameters:
ambient_point (array-like, shape=[…, n_nodes, n_nodes]) – Ambient point to project.
topology (ForestTopology or list[ForestTopology]) – Stratum topology.
initial_weights (array-like, shape=[…, n_splits]) – Initial guess for weights.
- geomstats.geometry.stratified.wald_space.generate_random_wald(n_labels, p_keep, p_new, btol=1e-08, check=True)[source]#
Generate a random instance of class
Wald
.- Parameters:
n_labels (int) – The number of labels the wald is generated with respect to.
p_keep (float) – The probability will be inserted into the generation of a partition as well as for the generation of a split set for the topology of the wald.
p_new (float) – A float between 0 and 1, the probability that no new component is added, and probability of 1 - p_new_ that a new component is added.
btol (float) – Tolerance for the boundary of the coordinates in each grove. Defaults to 1e-08.
check (bool) – If True, checks if splits still separate all labels. In this case, the split will not be deleted. If False, any split can be randomly deleted.
- Returns:
random_wald (Wald) – The randomly generated wald.
Module contents#
The Stratified Space Geometry Package.