Learning#
Estimating the mean (or other central points)#
Frechet mean.
Lead authors: Nicolas Guigui and Nina Miolane.
- class geomstats.learning.frechet_mean.FrechetMean(metric, max_iter=32, epsilon=0.0001, method='default', init_point=None, init_step_size=1.0, verbose=False)[source]#
Empirical Frechet mean.
- Parameters:
metric (RiemannianMetric) – Riemannian metric.
max_iter (int) – Maximum number of iterations for gradient descent. Optional, default: 32.
epsilon (float) – Tolerance for stopping the gradient descent. Optional, default : 1e-4
method (str, {'default', 'adaptive', 'batch'}) – Gradient descent method. The adaptive method uses a Levenberg-Marquardt style adaptation of the learning rate. The batch method is similar to the default method but for batches of equal length of samples. In this case, samples must be of shape [n_samples, n_batch, *metric.shape]. Optional, default: 'default'.
init_point (array-like, shape=[*metric.shape]) – Initial point. Optional, default : None. In this case the first sample of the input data is used.
init_step_size (float) – Initial step size or learning rate.
verbose (bool) – Verbose option. Optional, default: False.
- fit(X, y=None, weights=None)[source]#
Compute the empirical weighted Frechet mean.
- Parameters:
X (array-like, shape=[n_samples, *metric.shape]) – Training input samples.
y (None) – Target values. Ignored.
weights (array-like, shape=[n_samples,]) – Weights associated to the samples. Optional, default: None, in which case it is equally weighted.
- Returns:
self (object) – Returns self.
- property method#
Gradient descent method.
- geomstats.learning.frechet_mean.elastic_mean(points, weights=None, metric=None)[source]#
Compute the weighted mean of elastic curves.
SRV: Square Root Velocity.
SRV curves are a special case of Elastic curves.
The computation of the mean goes as follows:
Transform the curves into their SRVs/F-transform representations,
Compute the linear mean of the SRVs/F-transform representations,
Inverse-transform the mean in curve space.
- Parameters:
points (array-like, shape=[n_samples, k_sampling_points, dim]) – Points on the manifold of curves (i.e. curves) to be averaged.
weights (array-like, shape=[n_samples,]) – Weights associated to the points (i.e. curves). Optional, default: None.
- Returns:
mean (array-like, shape=[k_sampling_points, dim]) – Weighted linear mean of the points (i.e. of the curves).
- geomstats.learning.frechet_mean.linear_mean(points, weights=None)[source]#
Compute the weighted linear mean.
The linear mean is the Frechet mean when points:
lie in a Euclidean space with Euclidean metric,
lie in a Minkowski space with Minkowski metric.
- Parameters:
points (array-like, shape=[n_samples, dim]) – Points to be averaged.
weights (array-like, shape=[n_samples,]) – Weights associated to the points. Optional, default: None.
- Returns:
mean (array-like, shape=[dim,]) – Weighted linear mean of the points.
- geomstats.learning.frechet_mean.variance(points, base_point, metric, weights=None)[source]#
Variance of (weighted) points wrt a base point.
- Parameters:
points (array-like, shape=[n_samples, dim]) – Points.
weights (array-like, shape=[n_samples,]) – Weights associated to the points. Optional, default: None.
- Returns:
var (float) – Weighted variance of the points.
Exponential barycenter.
Lead author: Nicolas Guigui.
- class geomstats.learning.exponential_barycenter.ExponentialBarycenter(group, max_iter=32, epsilon=1e-06, init_step_size=1.0, point_type=None, verbose=False)[source]#
Empirical exponential barycenter for matrix groups.
- Parameters:
group (LieGroup) – Lie group instance on which the data lie.
max_iter (int) – Maximum number of iterations to perform in the gradient descent. Optional, default: 32.
epsilon (float) – Tolerance to reach convergence. The extrinsic norm of the gradient is used as criterion. Optional, default: 1e-6.
init_step_size (float) – Learning rate in the gradient descent. Optional, default: 1.
verbose (bool) – Level of verbosity to inform about convergence. Optional, default: 1.
- estimate_#
If fit, exponential barycenter.
- Type:
array-like, shape=[dim, dim]
- fit(X, y=None, weights=None)[source]#
Compute the empirical weighted exponential barycenter.
- Parameters:
X (array-like, shape=[n_samples, dim, dim]) – Training input samples.
y (None) – Target values. Ignored.
weights (array-like, shape=[n_samples,]) – Weights associated to the samples. Optional, default: None, in which case it is equally weighted.
- Returns:
self (object) – Returns self.
Geometric median.
- class geomstats.learning.geometric_median.GeometricMedian(metric, max_iter=100, lr=1.0, init_point=None, print_every=None, epsilon=1e-12)[source]#
Geometric median.
- Parameters:
metric (RiemannianMetric) – Riemannian metric.
max_iter (int) – Maximum number of iterations for the algorithm. Optional, default : 100
lr (float) – Learning rate to be used for the algorithm. Optional, default : 1.0
init_point (array-like, shape=[*metric.shape]) – Initialization to be used in the start. Optional, default : None, in which case it uses last sample.
print_every (int) – Print updated median after print_every iterations. Optional, default : None
epsilon (float) – Tolerance for stopping the algorithm (distance between two successive estimates). Optional, default : gs.atol
References
[FVJ2009]Fletcher PT, Venkatasubramanian S and Joshi S. “The geometric median on Riemannian manifolds with application to robust atlas estimation”, NeuroImage, 2009 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2735114/
- fit(X, y=None, weights=None)[source]#
Compute the weighted geometric median.
Compute the geometric median on manifold using Weiszfeld algorithm.
- Parameters:
X (array-like, shape=[n_samples, *metric.shape]) – Training input samples.
y (None) – Target values. Ignored.
weights (array-like, shape=[n_samples,]) – Weights associated to the samples. Optional, default: None, in which case it is equally weighted.
- Returns:
self (object) – Returns self.
Clustering#
K-means clustering.
Lead author: Hadi Zaatiti.
- class geomstats.learning.kmeans.RiemannianKMeans(metric, n_clusters=8, init='random', init_step_size=1.0, tol=0.01, max_iter=100, max_iter_mean=100, mean_method='default', verbose=0)[source]#
Class for k-means clustering on manifolds.
K-means algorithm using Riemannian manifolds.
- Parameters:
n_clusters (int) – Number of clusters (k value of the k-means). Optional, default: 8.
metric (object of class RiemannianMetric) – The geomstats Riemannian metric associate to the space used.
init (str or callable or array-like, shape=[n_clusters, n_features]) – How to initialize centroids at the beginning of the algorithm. The choice ‘random’ will select training points as initial centroids uniformly at random. The choice ‘kmeans++’ selects centroids heuristically to improve the convergence rate. When providing an array of shape
(n_clusters, n_features)
, the centroids are chosen as the rows of that array. When providing a callable, it receives as arguments the argumentX
tofit()
and the number of centroidsn_clusters
and is expected to return an array as above. Optional, default: ‘random’.tol (float) – Convergence factor. Convergence is achieved when the difference of mean distance between two steps is lower than tol. Optional, default: 1e-2.
max_iter (int) – Maximum number of iterations. Optional, default: 100
max_iter_mean (int) – Maximum number of iterations for the gradient descent of each Frechet mean. Optional, default: 100.
verbose (int) – If verbose > 0, information will be printed during learning. Optional, default: 0.
Example
Available example on the Poincaré Ball and Hypersphere manifolds
examples.plot_kmeans_manifolds
- fit(X)[source]#
Provide clusters centroids and data labels.
Alternate between computing the mean of each cluster and labelling data according to the new positions of the centroids.
- Parameters:
X (array-like, shape=[n_samples, n_features]) – Training data, where n_samples is the number of samples and n_features is the number of features.
max_iter (int) – Maximum number of iterations. Optional, default: 100.
- Returns:
self (array-like, shape=[n_clusters,]) – Centroids.
- predict(X)[source]#
Predict the labels for each data point.
Label each data point with the cluster having the nearest centroid using metric distance.
- Parameters:
X (array-like, shape=[…, n_features]) – Input data.
- Returns:
self (array-like, shape=[…,]) – Array of predicted cluster indices for each sample.
Online kmeans algorithm on Manifolds.
Lead author: Alice Le Brigant.
- class geomstats.learning.online_kmeans.OnlineKMeans(metric, n_clusters, n_repetitions=20, atol=1e-05, max_iter=50000.0, point_type='vector')[source]#
Online k-means clustering.
Online k-means clustering seeks to divide a set of data points into a specified number of classes, while minimizing intra-class variance. It is closely linked to discrete quantization, which computes the closest approximation of the empirical distribution of the dataset by a discrete distribution supported by a smaller number of points with respect to the Wasserstein distance. The algorithm used can either be seen as an online version of the k-means algorithm or as Competitive Learning Riemannian Quantization (see [LBP2019]).
- Parameters:
metric (object) – Metric of the space in which the data lives. At each iteration, one of the cluster centers is moved in the direction of the new datum, according the exponential map of the underlying space, which is a method of metric.
n_clusters (int) – Number of clusters of the k-means clustering, or number of desired atoms of the quantized distribution.
n_repetitions (int, default=20) – The cluster centers are updated using decreasing step sizes, each of which stays constant for n_repetitions iterations to allow a better exploration of the data points.
max_iter (int, default=5e4) – Maximum number of iterations. If it is reached, the quantization may be inacurate.
- cluster_centers_#
Coordinates of cluster centers.
- Type:
array, [n_clusters, n_features]
- labels_#
Labels of each point.
Example
>>> from geomstats.geometry.hypersphere import Hypersphere >>> from geomstats.learning.onlinekmeans import OnlineKmeans >>> sphere = Hypersphere(dim=2) >>> metric = sphere.metric >>> X = sphere.random_von_mises_fisher(kappa=10, n_samples=50) >>> clustering = OnlineKmeans(metric=metric,n_clusters=4).fit(X) >>> clustering.cluster_centers_ >>> clustering.labels_
References
[LBP2019]A. Le Brigant and S. Puechmorel, Optimal Riemannian quantization with an application to air traffic analysis. J. Multivar. Anal. 173 (2019), 685 - 703.
- geomstats.learning.online_kmeans.online_kmeans(X, metric, n_clusters, n_repetitions=20, atol=1e-05, max_iter=50000.0)[source]#
Perform online K-means clustering.
Perform online version of k-means algorithm on data contained in X. The data points are treated sequentially and the cluster centers are updated one at a time. This version of k-means avoids computing the mean of each cluster at each iteration and is therefore less computationally intensive than the offline version.
In the setting of quantization of probability distributions, this algorithm is also known as Competitive Learning Riemannian Quantization. It computes the closest approximation of the empirical distribution of data by a discrete distribution supported by a smaller number of points with respect to the Wasserstein distance. This smaller number of points is n_clusters.
- Parameters:
X (array-like, shape=[…, n_features]) – Input data. It is treated sequentially by the algorithm, i.e. one datum is chosen randomly at each iteration.
metric (object) – Metric of the space in which the data lives. At each iteration, one of the cluster centers is moved in the direction of the new datum, according the exponential map of the underlying space, which is a method of metric.
n_clusters (int) – Number of clusters of the k-means clustering, or number of desired atoms of the quantized distribution.
n_repetitions (int, default=20) – The cluster centers are updated using decreasing step sizes, each of which stays constant for n_repetitions iterations to allow a better exploration of the data points.
max_iter (int, default=5e4) – Maximum number of iterations. If it is reached, the quantization may be inacurate.
- Returns:
cluster_centers (array, shape=[n_clusters, n_features]) – Coordinates of cluster centers.
labels (array, shape=[n_samples]) – Cluster labels for each point.
The Agglomerative Hierarchical Clustering (AHC) on manifolds.
Lead author: Yann Cabanes.
- class geomstats.learning.agglomerative_hierarchical_clustering.AgglomerativeHierarchicalClustering(n_clusters=2, distance='euclidean', memory=None, connectivity=None, compute_full_tree='auto', linkage='average', distance_threshold=None)[source]#
The Agglomerative Hierarchical Clustering on manifolds.
Recursively merges the pair of clusters that minimally increases a given linkage distance.
- Parameters:
n_clusters (int or None, default=2) – The number of clusters to find. It must be
None
ifdistance_threshold
is notNone
.distance (str or callable, default=’euclidean’) – Distance used to compute the linkage. Can be ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’, ‘cosine’, or ‘precomputed’. If linkage is ‘ward’, only ‘euclidean’ is accepted. If ‘precomputed’, a distance matrix (instead of a similarity matrix) is needed as input for the fit method.
memory (str or object, default=None) – Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory.
connectivity (array-like or callable, default=None) – Connectivity matrix. Defines for each sample the neighboring samples following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix. Default is None, i.e, the hierarchical clustering algorithm is unstructured.
compute_full_tree (‘auto’ or bool, default=’auto’) – Stop early the construction of the tree at n_clusters. This is useful to decrease computation time if the number of clusters is not small compared to the number of samples. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree. It must be
True
ifdistance_threshold
is notNone
. By default compute_full_tree is ‘auto’, which is equivalent to True when distance_threshold is not None or that n_clusters is inferior to the maximum between 100 or 0.02 * n_samples. Otherwise, ‘auto’ is equivalent to False.linkage ({‘ward’, ‘complete’, ‘average’, ‘single’}, default=’average’) – Which linkage criterion to use. The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion.
average uses the average of the distances of each observation of the two sets.
complete or maximum linkage uses the maximum distances between all observations of the two sets.
single uses the minimum of the distances between all observations of the two sets.
ward minimizes the variance of the clusters being merged. It works for the ‘euclidean’ distance only.
distance_threshold (float, default=None) – The linkage distance threshold above which, clusters will not be merged. If not
None
,n_clusters
must beNone
andcompute_full_tree
must beTrue
.
- n_clusters_#
The number of clusters found by the algorithm. If
distance_threshold=None
, it will be equal to the givenn_clusters
.- Type:
int
- labels_#
Cluster labels for each point.
- Type:
ndarray, shape=[…,]
- n_leaves_#
Number of leaves in the hierarchical tree.
- Type:
int
- n_connected_components_#
The estimated number of connected components in the graph.
- Type:
int
- children_#
The children of each non-leaf node. Values less than n_samples correspond to leaves of the tree which are the original samples. A node i greater than or equal to n_samples is a non-leaf node and has children children_[i - n_samples]. Alternatively at the i-th iteration, children[i][0] and children[i][1] are merged to form node n_samples + i.
- Type:
array-like, shape=[n_samples-1, 2]
References
This algorithm uses the scikit-learn library: scikit-learn/scikit-learn cluster/_agglomerative.py#L656
Classification#
The MDM classifier on manifolds.
Lead authors: Daniel Brooks and Quentin Barthelemy.
- class geomstats.learning.mdm.RiemannianMinimumDistanceToMean(riemannian_metric)[source]#
Minimum Distance to Mean (MDM) classifier on manifolds.
Classification by nearest centroid. For each of the given classes, a centroid is estimated according to the chosen metric. Then, for each new point, the class is affected according to the nearest centroid [BBCJ2012].
- Parameters:
riemannian_metric (RiemannianMetric) – Riemannian metric to be used.
- classes_#
If fit, labels of training set.
- Type:
array-like, shape=[n_classes,]
- mean_estimates_#
If fit, centroids computed on training set.
- Type:
array-like, shape=[n_classes, *metric.shape]
References
[BBCJ2012]A. Barachant, S. Bonnet, M. Congedo and C. Jutten, Multiclass Brain-Computer Interface Classification by Riemannian Geometry. IEEE Trans. Biomed. Eng., vol. 59, pp. 920-928, 2012.
- fit(X, y, weights=None)[source]#
Compute Frechet mean of each class.
- Parameters:
X (array-like, shape=[n_samples, *metric.shape]) – Training input samples.
y (array-like, shape=[n_samples,]) – Training labels.
weights (array-like, shape=[n_samples,]) – Weights associated to the samples. Optional, default: None, in which case it is equally weighted.
- Returns:
self (object) – Returns self.
- predict(X)[source]#
Compute closest neighbor according to riemannian_metric.
- Parameters:
X (array-like, shape=[n_samples, *metric.shape]) – Test samples.
- Returns:
y (array-like, shape=[n_samples,]) – Predicted labels.
- predict_proba(X)[source]#
Compute probabilities.
Compute probabilities to belong to classes according to riemannian_metric.
- Parameters:
X (array-like, shape=[n_samples, *metric.shape]) – Test samples.
- Returns:
probas (array-like, shape=[n_samples, n_classes]) – Probability of the sample for each class in the model.
The KNN classifier on manifolds.
Lead author: Yann Cabanes.
- class geomstats.learning.knn.KNearestNeighborsClassifier(n_neighbors=5, weights='uniform', p=2, distance='minkowski', distance_params=None, n_jobs=None, **kwargs)[source]#
Classifier implementing the k-nearest neighbors vote on manifolds.
- Parameters:
n_neighbors (int, optional (default = 5)) – Number of neighbors to use by default.
weights (string or callable, optional (default = ‘uniform’)) – Weight function used in prediction. Possible values:
‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
[callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
p (integer, optional (default = 2)) – Power parameter for the ‘minkowski’ string distance. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
distance (string or callable, optional (default = ‘minkowski’)) – The distance metric to use. The default distance is minkowski, and with p=2 is equivalent to the standard Euclidean distance. See the documentation of the DistanceMetric class in the scikit-learn library for a list of available distances. If distance is “precomputed”, X is assumed to be a distance matrix and must be square during fit.
distance_params (dict, optional (default = None)) – Additional keyword arguments for the distance function.
n_jobs (int or None, optional (default = None)) – The number of parallel jobs to run for neighbors search.
None
means 1;-1
means using all processors.
- classes_#
Class labels known to the classifier
- Type:
array, shape=[n_classes,]
- effective_metric_#
The distance metric used. It will be same as the distance parameter or a synonym of it, e.g. ‘euclidean’ if the distance parameter set to ‘minkowski’ and p parameter set to 2.
- Type:
string or callable
- effective_metric_params_#
Additional keyword arguments for the distance function. For most distances will be same with distance_params parameter, but may also contain the p parameter value if the effective_metric_ attribute is set to ‘minkowski’.
- Type:
dict
- outputs_2d_#
False when y’s shape is (n_samples, ) or (n_samples, 1) during fit otherwise True.
- Type:
bool
References
This algorithm uses the scikit-learn library: scikit-learn/scikit-learn neighbors/_classification.py#L25
Dimension reduction#
Principal Component Analysis on Manifolds.
Lead author: Nina Miolane.
- class geomstats.learning.pca.TangentPCA(metric, n_components=None, copy=True, whiten=False, tol=0.0, iterated_power='auto', random_state=None)[source]#
Tangent Principal component analysis (tPCA).
Linear dimensionality reduction using Singular Value Decomposition of the Riemannian Log of the data at the tangent space of the Frechet mean.
- Parameters:
metric (RiemannianMetric) – Riemannian metric.
n_components (int) – Number of principal components. Optional, default: None.
- fit(X, y=None, base_point=None)[source]#
Fit the model with X.
- Parameters:
X (array-like, shape=[…, n_features]) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored (Compliance with scikit-learn interface))
base_point (array-like, shape=[…, n_features], optional) – Point at which to perform the tangent PCA Optional, default to Frechet mean if None.
- Returns:
self (object) – Returns the instance itself.
- fit_transform(X, y=None, base_point=None)[source]#
Fit the model with X and apply the dimensionality reduction on X.
- Parameters:
X (array-like, shape=[…, n_features]) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored (Compliance with scikit-learn interface))
base_point (array-like, shape=[…, n_features]) – Point at which to perform the tangent PCA Optional, default to Frechet mean if None.
- Returns:
X_new (array-like, shape=[…, n_components]) – Projected data.
- inverse_transform(X)[source]#
Low-dimensional reconstruction of X.
The reconstruction will match X_original whose transform would be X if n_components=min(n_samples, n_features).
- Parameters:
X (array-like, shape=[…, n_components]) – New data, where n_samples is the number of samples and n_components is the number of components.
- Returns:
X_original (array-like, shape=[…, n_features]) – Original data.
- transform(X, y=None)[source]#
Project X on the principal components.
- Parameters:
X (array-like, shape=[…, n_features]) – Data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored (Compliance with scikit-learn interface))
- Returns:
X_new (array-like, shape=[…, n_components]) – Projected data.
Graphs#
Align All and Compute for Graph Space.
Lead author: Anna Calissano.
- class geomstats.learning.aac.AAC(metric, *args, estimate='frechet', **kwargs)[source]#
Class for Align all and Compute algorithm on Graph Space.
The Align All and Compute (AAC) algorithm is introduced in [Calissano2020] and it allows to compute different statistical estimators: the Frechet Mean, the Generalized Geodesic Principal components and the Regression for a set of labeled or unlabeled graphs. The idea is to optimally aligned the graphs to the current estimator using the correct alignment technique and compute the current estimation using the geometrical property of the total space, i.e., the Euclidean space of adjacency matrices.
- Parameters:
metric (GraphSpaceMetric) – Metric Class on Graph Space.
estimate (str) – Desired estimator. One of the following: - “frechet_mean”: Frechet Mean estimation [Calissano2020] - “ggpca”: Generalized Geodesic Principal Components [Calissano2020] - “regression”: Graph-on-vector regression model [Calissano2022]
Examples
Available example on Graph Space:
notebooks.19_practical_methods__aac
Available example on Graph Space with real world data:
notebooks.20_real_world_application__graph_space
References
[Calissano2020] (1,2,3)Calissano, A., Feragen, A., Vantini, S. “Graph Space: Geodesic Principal Components for a Population of Network-valued Data.” Mox report 14, 2020. https://mox.polimi.it/reports-and-theses/publication-results/?id=855.
[Calissano2022]Calissano, A., Feragen, A., Vantini, S. “Graph-valued regression: prediction of unlabelled networks in a non-Euclidean Graph Space.”Journal of Multivariate Analysis 190 - 104950, (2022). https://doi.org/10.1016/j.jmva.2022.104950.