# Learning#

## Estimating the mean#

Frechet mean.

Lead authors: Nicolas Guigui and Nina Miolane.

class geomstats.learning.frechet_mean.FrechetMean(metric, max_iter=32, epsilon=0.0001, point_type=None, method='default', init_point=None, init_step_size=1.0, verbose=False)[source]#

Empirical Frechet mean.

Parameters:
• metric (RiemannianMetric) – Riemannian metric.

• max_iter (int) – Maximum number of iterations for gradient descent. Optional, default: 32.

• epsilon (float) – Tolerance for stopping the gradient descent. Optional, default : 1e-4

• point_type (str, {'vector', 'matrix'}) – Point type. Optional, default: None.

• method (str, {'default', 'adaptive', 'batch'}) – Gradient descent method. The adaptive method uses a Levenberg-Marquardt style adaptation of the learning rate. The batch method is similar to the default method but for batches of equal length of samples. In this case, samples must be of shape [n_samples, n_batch, {dim, [n,n]}]. Optional, default: 'default'.

• init_point (array-like, shape=[{dim, [n, n]}]) – Initial point. Optional, default : None. In this case the first sample of the input data is used.

• init_step_size (float) – Initial step size or learning rate.

• verbose (bool) – Verbose option. Optional, default: False.

fit(X, y=None, weights=None)[source]#

Compute the empirical Frechet mean.

Parameters:
• X ({array-like, sparse matrix}, shape=[…, {dim, [n, n]}]) – Training input samples.

• y (array-like, shape=[…,] or […, n_outputs]) – Target values (class labels in classification, real numbers in regression). Ignored.

• weights (array-like, shape=[…,]) – Weights associated to the points. Optional, default: None.

Returns:

self (object) – Returns self.

geomstats.learning.frechet_mean.elastic_mean(points, weights=None, metric=None)[source]#

Compute the weighted mean of elastic curves.

SRV: Square Root Velocity.

SRV curves are a special case of Elastic curves.

The computation of the mean goes as follows: - Transform the curves into their SRVs/F-transform representations, - Compute the linear mean of the SRVs/F-transform representations, - Inverse-transform the mean in curve space.

Parameters:
• points (array-like, shape=[…, n_sampling_points, dim]) – Points on the manifold of curves (i.e. curves) to be averaged.

• weights (array-like, shape=[…,]) – Weights associated to the points (i.e. curves). Optional, default: None.

Returns:

mean (array-like, shape=[n_sampling_points, dim]) – Weighted linear mean of the points (i.e. of the curves).

geomstats.learning.frechet_mean.linear_mean(points, weights=None, point_type='vector')[source]#

Compute the weighted linear mean.

The linear mean is the Frechet mean when points:

• lie in a Euclidean space with Euclidean metric,

• lie in a Minkowski space with Minkowski metric.

Parameters:
• points (array-like, shape=[…, dim]) – Points to be averaged.

• weights (array-like, shape=[…,]) – Weights associated to the points. Optional, default: None.

Returns:

mean (array-like, shape=[dim,]) – Weighted linear mean of the points.

geomstats.learning.frechet_mean.variance(points, base_point, metric, weights=None, point_type='vector')[source]#

Variance of (weighted) points wrt a base point.

Parameters:
• points (array-like, shape=[…, dim]) – Points.

• weights (array-like, shape=[…,]) – Weights associated to the points. Optional, default: None.

Returns:

var (float) – Weighted variance of the points.

Exponential barycenter.

class geomstats.learning.exponential_barycenter.ExponentialBarycenter(group, max_iter=32, epsilon=1e-06, init_step_size=1.0, point_type=None, verbose=False)[source]#

Empirical Exponential Barycenter for Matrix groups.

Parameters:
• group (LieGroup) – Lie group instance on which the data lie.

• max_iter (int) – Maximum number of iterations to perform in the gradient descent. Optional, default: 32.

• epsilon (float) – Tolerance to reach convergence. The exstrinsic norm of the gradient is used as criterion. Optional, default: 1e-6.

• init_step_size (float) – Learning rate in the gradient descent. Optional, default: 1.

• verbose (bool) – Level of verbosity to inform about convergence. Optional, default: 1.

estimate_#
Type:

array-like, shape=[dim, dim]

fit(X, y=None, weights=None)[source]#

Compute the empirical Exponential Barycenter mean.

Parameters:
• X ({array-like, sparse matrix}, shape=[…, n_features]) – Training input samples.

• y (array-like, shape=[…,] or […, n_outputs]) – Target values (class labels in classification, real numbers in regression). Ignored.

• weights (array-like, shape=[…,]) – Weights associated to the points. Optional, default: None.

Returns:

self (object) – Returns self.

## Clustering#

K-means clustering.

class geomstats.learning.kmeans.RiemannianKMeans(metric, n_clusters=8, init='random', init_step_size=1.0, tol=0.01, max_iter=100, max_iter_mean=100, mean_method='default', verbose=0)[source]#

Class for k-means clustering on manifolds.

K-means algorithm using Riemannian manifolds.

Parameters:
• n_clusters (int) – Number of clusters (k value of the k-means). Optional, default: 8.

• metric (object of class RiemannianMetric) – The geomstats Riemannian metric associate to the space used.

• init (str or callable or array-like, shape=[n_clusters, n_features]) – How to initialize centroids at the beginning of the algorithm. The choice ‘random’ will select training points as initial centroids uniformly at random. The choice ‘kmeans++’ selects centroids heuristically to improve the convergence rate. When providing an array of shape `(n_clusters, n_features)`, the centroids are chosen as the rows of that array. When providing a callable, it receives as arguments the argument `X` to `fit()` and the number of centroids `n_clusters` and is expected to return an array as above. Optional, default: ‘random’.

• tol (float) – Convergence factor. Convergence is achieved when the difference of mean distance between two steps is lower than tol. Optional, default: 1e-2.

• max_iter (int) – Maximum number of iterations. Optional, default: 100

• max_iter_mean (int) – Maximum number of iterations for the gradient descent of each Frechet mean. Optional, default: 100.

• verbose (int) – If verbose > 0, information will be printed during learning. Optional, default: 0.

Example

Available example on the Poincaré Ball and Hypersphere manifolds `examples.plot_kmeans_manifolds`

fit(X)[source]#

Provide clusters centroids and data labels.

Alternate between computing the mean of each cluster and labelling data according to the new positions of the centroids.

Parameters:
• X (array-like, shape=[…, n_features]) – Training data, where n_samples is the number of samples and n_features is the number of features.

• max_iter (int) – Maximum number of iterations. Optional, default: 100.

Returns:

self (array-like, shape=[n_clusters,]) – Centroids.

predict(X)[source]#

Predict the labels for each data point.

Label each data point with the cluster having the nearest centroid using metric distance.

Parameters:

X (array-like, shape=[…, n_features]) – Input data.

Returns:

self (array-like, shape=[…,]) – Array of predicted cluster indices for each sample.

Online kmeans algorithm on Manifolds.

class geomstats.learning.online_kmeans.OnlineKMeans(metric, n_clusters, n_repetitions=20, atol=1e-05, max_iter=50000.0, point_type='vector')[source]#

Online k-means clustering.

Online k-means clustering seeks to divide a set of data points into a specified number of classes, while minimizing intra-class variance. It is closely linked to discrete quantization, which computes the closest approximation of the empirical distribution of the dataset by a discrete distribution supported by a smaller number of points with respect to the Wasserstein distance. The algorithm used can either be seen as an online version of the k-means algorithm or as Competitive Learning Riemannian Quantization (see [LBP2019]).

Parameters:
• metric (object) – Metric of the space in which the data lives. At each iteration, one of the cluster centers is moved in the direction of the new datum, according the exponential map of the underlying space, which is a method of metric.

• n_clusters (int) – Number of clusters of the k-means clustering, or number of desired atoms of the quantized distribution.

• n_repetitions (int, default=20) – The cluster centers are updated using decreasing step sizes, each of which stays constant for n_repetitions iterations to allow a better exploration of the data points.

• max_iter (int, default=5e4) – Maximum number of iterations. If it is reached, the quantization may be inacurate.

cluster_centers_#

Coordinates of cluster centers.

Type:

array, [n_clusters, n_features]

labels_#

Labels of each point.

Example

```>>> from geomstats.geometry.hypersphere import Hypersphere
>>> from geomstats.learning.onlinekmeans import OnlineKmeans
>>> sphere = Hypersphere(dim=2)
>>> metric = sphere.metric
>>> X = sphere.random_von_mises_fisher(kappa=10, n_samples=50)
>>> clustering = OnlineKmeans(metric=metric,n_clusters=4).fit(X)
>>> clustering.cluster_centers_
>>> clustering.labels_
```

References

[LBP2019]

A. Le Brigant and S. Puechmorel, Optimal Riemannian quantization with an application to air traffic analysis. J. Multivar. Anal. 173 (2019), 685 - 703.

fit(X)[source]#

Perform clustering.

Parameters:

X (array-like, shape=[n_features, n_samples]) – Samples to cluster.

predict(point)[source]#

Predict the closest cluster each sample in X belongs to.

Parameters:

X (array-like, shape=[n_features]) – New data to predict.

Returns:

labels (int) – Index of the cluster each sample belongs to.

geomstats.learning.online_kmeans.online_kmeans(X, metric, n_clusters, n_repetitions=20, atol=1e-05, max_iter=50000.0)[source]#

Perform online K-means clustering.

Perform online version of k-means algorithm on data contained in X. The data points are treated sequentially and the cluster centers are updated one at a time. This version of k-means avoids computing the mean of each cluster at each iteration and is therefore less computationally intensive than the offline version.

In the setting of quantization of probability distributions, this algorithm is also known as Competitive Learning Riemannian Quantization. It computes the closest approximation of the empirical distribution of data by a discrete distribution supported by a smaller number of points with respect to the Wasserstein distance. This smaller number of points is n_clusters.

Parameters:
• X (array-like, shape=[…, n_features]) – Input data. It is treated sequentially by the algorithm, i.e. one datum is chosen randomly at each iteration.

• metric (object) – Metric of the space in which the data lives. At each iteration, one of the cluster centers is moved in the direction of the new datum, according the exponential map of the underlying space, which is a method of metric.

• n_clusters (int) – Number of clusters of the k-means clustering, or number of desired atoms of the quantized distribution.

• n_repetitions (int, default=20) – The cluster centers are updated using decreasing step sizes, each of which stays constant for n_repetitions iterations to allow a better exploration of the data points.

• max_iter (int, default=5e4) – Maximum number of iterations. If it is reached, the quantization may be inacurate.

Returns:

• cluster_centers (array, shape=[n_clusters, n_features]) – Coordinates of cluster centers.

• labels (array, shape=[n_samples]) – Cluster labels for each point.

The Agglomerative Hierarchical Clustering (AHC) on manifolds.

class geomstats.learning.agglomerative_hierarchical_clustering.AgglomerativeHierarchicalClustering(n_clusters=2, distance='euclidean', memory=None, connectivity=None, compute_full_tree='auto', linkage='average', distance_threshold=None)[source]#

The Agglomerative Hierarchical Clustering on manifolds.

Recursively merges the pair of clusters that minimally increases a given linkage distance.

Parameters:
• n_clusters (int or None, default=2) – The number of clusters to find. It must be `None` if `distance_threshold` is not `None`.

• distance (str or callable, default=’euclidean’) – Distance used to compute the linkage. Can be ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’, ‘cosine’, or ‘precomputed’. If linkage is ‘ward’, only ‘euclidean’ is accepted. If ‘precomputed’, a distance matrix (instead of a similarity matrix) is needed as input for the fit method.

• memory (str or object, default=None) – Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory.

• connectivity (array-like or callable, default=None) – Connectivity matrix. Defines for each sample the neighboring samples following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix. Default is None, i.e, the hierarchical clustering algorithm is unstructured.

• compute_full_tree (‘auto’ or bool, default=’auto’) – Stop early the construction of the tree at n_clusters. This is useful to decrease computation time if the number of clusters is not small compared to the number of samples. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree. It must be `True` if `distance_threshold` is not `None`. By default compute_full_tree is ‘auto’, which is equivalent to True when distance_threshold is not None or that n_clusters is inferior to the maximum between 100 or 0.02 * n_samples. Otherwise, ‘auto’ is equivalent to False.

• linkage ({‘ward’, ‘complete’, ‘average’, ‘single’}, default=’average’) – Which linkage criterion to use. The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion. - average uses the average of the distances of each observation of

the two sets.

• complete or maximum linkage uses the maximum distances between all observations of the two sets.

• single uses the minimum of the distances between all observations of the two sets.

• ward minimizes the variance of the clusters being merged. It works for the ‘euclidean’ distance only.

• distance_threshold (float, default=None) – The linkage distance threshold above which, clusters will not be merged. If not `None`, `n_clusters` must be `None` and `compute_full_tree` must be `True`.

n_clusters_#

The number of clusters found by the algorithm. If `distance_threshold=None`, it will be equal to the given `n_clusters`.

Type:

int

labels_#

Cluster labels for each point.

Type:

ndarray, shape=[…,]

n_leaves_#

Number of leaves in the hierarchical tree.

Type:

int

n_connected_components_#

The estimated number of connected components in the graph.

Type:

int

children_#

The children of each non-leaf node. Values less than n_samples correspond to leaves of the tree which are the original samples. A node i greater than or equal to n_samples is a non-leaf node and has children children_[i - n_samples]. Alternatively at the i-th iteration, children[i] and children[i] are merged to form node n_samples + i.

Type:

array-like, shape=[n_samples-1, 2]

References

This algorithm uses the scikit-learn library: https://github.com/scikit-learn/scikit-learn/blob/95d4f0841/sklearn/ cluster/_agglomerative.py#L656

## Classification#

The MDM classifier on manifolds.

Lead authors: Daniel Brooks and Quentin Barthelemy.

class geomstats.learning.mdm.RiemannianMinimumDistanceToMeanClassifier(riemannian_metric, n_classes, point_type='matrix')[source]#

Minimum Distance to Mean (MDM) classifier on manifolds.

Classification by nearest centroid. For each of the given classes, a centroid is estimated according to the chosen metric. Then, for each new point, the class is affected according to the nearest centroid (see [BBCJ2012]).

Parameters:
• riemannian_metric (RiemannianMetric) – Riemannian metric to be used.

• n_classes (int) – Number of classes.

• point_type (str, {'vector', 'matrix'}) – Point type. Optional, default: 'matrix'.

mean_estimates_#

If fit, centroids computed on training set.

Type:

list

classes_#

If fit, classes of training set.

Type:

list

References

[BBCJ2012]

A. Barachant, S. Bonnet, M. Congedo and C. Jutten, Multiclass Brain-Computer Interface Classification by Riemannian Geometry. IEEE Trans. Biomed. Eng., vol. 59, pp. 920-928, 2012.

fit(X, y)[source]#

Compute Frechet mean of each class.

Parameters:
• X (array-like, shape=[n_samples, dim]) – if point_type=’vector’

shape=[n_samples, n, n] if point_type=’matrix’

Training data, where n_samples is the number of samples and n_features is the number of features.

• y (array-like, shape=[n_samples,]) – Training labels.

predict(X)[source]#

Compute closest neighbor according to riemannian_metric.

Parameters:
X (array-like, shape=[n_samples, dim]) – if point_type=’vector’

shape=[n_samples, n, n] if point_type=’matrix’

Test data, where n_samples is the number of samples and n_features is the number of features.

Returns:

y (array-like, shape=[n_samples,]) – Predicted labels.

predict_proba(X)[source]#

Compute probabilities.

Compute probabilities to belong to classes according to riemannian_metric.

Parameters:
X (array-like, shape=[n_samples, dim]) – if point_type=’vector’

shape=[n_samples, n, n] if point_type=’matrix’

Test data, where n_samples is the number of samples and n_features is the number of features.

Returns:

probas (array-like, shape=[n_samples, n_classes]) – Probability of the sample for each class in the model.

score(X, y, weights=None)[source]#

Compute score on the given test data and labels.

Compute the score defined as accuracy.

Parameters:
• X (array-like, shape=[n_samples, dim]) – if point_type=’vector’

shape=[n_samples, n, n] if point_type=’matrix’

Test data, where n_samples is the number of samples and n_features is the number of features.

• y (array-like, shape=[n_samples,]) – True labels for X.

• weights (array-like, shape=[n_samples,]) – Weights associated to the samples. Optional, default: None.

Returns:

score (float) – Mean accuracy of `self.predict(X)` wrt. y.

The KNN classifier on manifolds.

class geomstats.learning.knn.KNearestNeighborsClassifier(n_neighbors=5, weights='uniform', p=2, distance='minkowski', distance_params=None, n_jobs=None, **kwargs)[source]#

Classifier implementing the k-nearest neighbors vote on manifolds.

Parameters:
• n_neighbors (int, optional (default = 5)) – Number of neighbors to use by default.

• weights (string or callable, optional (default = ‘uniform’)) – Weight function used in prediction. Possible values: - ‘uniform’ : uniform weights. All points in each neighborhood

are weighted equally.

• ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

• [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

• p (integer, optional (default = 2)) – Power parameter for the ‘minkowski’ string distance. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

• distance (string or callable, optional (default = ‘minkowski’)) – The distance metric to use. The default distance is minkowski, and with p=2 is equivalent to the standard Euclidean distance. See the documentation of the DistanceMetric class in the scikit-learn library for a list of available distances. If distance is “precomputed”, X is assumed to be a distance matrix and must be square during fit.

• distance_params (dict, optional (default = None)) – Additional keyword arguments for the distance function.

• n_jobs (int or None, optional (default = None)) – The number of parallel jobs to run for neighbors search. `None` means 1; `-1` means using all processors.

classes_#

Class labels known to the classifier

Type:

array, shape=[n_classes,]

effective_metric_#

The distance metric used. It will be same as the distance parameter or a synonym of it, e.g. ‘euclidean’ if the distance parameter set to ‘minkowski’ and p parameter set to 2.

Type:

string or callable

effective_metric_params_#

Additional keyword arguments for the distance function. For most distances will be same with distance_params parameter, but may also contain the p parameter value if the effective_metric_ attribute is set to ‘minkowski’.

Type:

dict

outputs_2d_#

False when y’s shape is (n_samples, ) or (n_samples, 1) during fit otherwise True.

Type:

bool

References

This algorithm uses the scikit-learn library: https://github.com/scikit-learn/scikit-learn/blob/95d4f0841/sklearn/ neighbors/_classification.py#L25

## Dimension reduction#

Principal Component Analysis on Manifolds.

class geomstats.learning.pca.TangentPCA(metric, n_components=None, copy=True, whiten=False, tol=0.0, iterated_power='auto', random_state=None)[source]#

Tangent Principal component analysis (tPCA).

Linear dimensionality reduction using Singular Value Decomposition of the Riemannian Log of the data at the tangent space of the Frechet mean.

Parameters:
• metric (RiemannianMetric) – Riemannian metric.

• n_components (int) – Number of principal components. Optional, default: None.

fit(X, y=None, base_point=None)[source]#

Fit the model with X.

Parameters:
• X (array-like, shape=[…, n_features]) – Training data, where n_samples is the number of samples and n_features is the number of features.

• y (Ignored (Compliance with scikit-learn interface))

• base_point (array-like, shape=[…, n_features], optional) – Point at which to perform the tangent PCA Optional, default to Frechet mean if None.

Returns:

self (object) – Returns the instance itself.

fit_transform(X, y=None, base_point=None)[source]#

Fit the model with X and apply the dimensionality reduction on X.

Parameters:
• X (array-like, shape=[…, n_features]) – Training data, where n_samples is the number of samples and n_features is the number of features.

• y (Ignored (Compliance with scikit-learn interface))

• base_point (array-like, shape=[…, n_features]) – Point at which to perform the tangent PCA Optional, default to Frechet mean if None.

Returns:

X_new (array-like, shape=[…, n_components]) – Projected data.

inverse_transform(X)[source]#

Low-dimensional reconstruction of X.

The reconstruction will match X_original whose transform would be X if n_components=min(n_samples, n_features).

Parameters:

X (array-like, shape=[…, n_components]) – New data, where n_samples is the number of samples and n_components is the number of components.

Returns:

X_original (array-like, shape=[…, n_features]) – Original data.

transform(X, y=None)[source]#

Project X on the principal components.

Parameters:
• X (array-like, shape=[…, n_features]) – Data, where n_samples is the number of samples and n_features is the number of features.

• y (Ignored (Compliance with scikit-learn interface))

Returns:

X_new (array-like, shape=[…, n_components]) – Projected data.