Class for Align all and Compute algorithm on Graph Space.
The Align All and Compute (AAC) algorithm is introduced in [Calissano2020] and it
allows to compute different statistical estimators: the Frechet Mean, the
Generalized Geodesic Principal components and the Regression for a set of labeled or
unlabeled graphs.
The idea is to optimally aligned the graphs to the current
estimator using the correct alignment technique and compute the current estimation
using the geometrical property of the total space, i.e., the Euclidean space of
adjacency matrices.
Parameters:
space (GraphSpace) – Graph space total space with a quotient structure.
estimate (str) – Desired estimator. One of the following:
Calissano, A., Feragen, A., Vantini, S.
“Graph-valued regression: prediction of unlabelled networks in a non-Euclidean
Graph Space.”Journal of Multivariate Analysis 190 - 104950, (2022).
https://doi.org/10.1016/j.jmva.2022.104950.
The Agglomerative Hierarchical Clustering on manifolds.
Recursively merges the pair of clusters that minimally increases
a given linkage distance.
Parameters:
space (Manifold) – Equipped manifold.
n_clusters (int or None, default=2) – The number of clusters to find. It must be None if
distance_threshold is not None.
memory (str or object, default=None) – Used to cache the output of the computation of the tree.
By default, no caching is done. If a string is given, it is the
path to the caching directory.
connectivity (array-like or callable, default=None) – Connectivity matrix. Defines for each sample the neighboring
samples following a given structure of the data.
This can be a connectivity matrix itself or a callable that transforms
the data into a connectivity matrix. Default is None, i.e, the
hierarchical clustering algorithm is unstructured.
compute_full_tree (‘auto’ or bool, default=’auto’) – Stop early the construction of the tree at n_clusters. This is useful
to decrease computation time if the number of clusters is not small
compared to the number of samples. This option is useful only when
specifying a connectivity matrix. Note also that when varying the
number of clusters and using caching, it may be advantageous to compute
the full tree. It must be True if distance_threshold is not
None. By default compute_full_tree is ‘auto’, which is equivalent
to True when distance_threshold is not None or that n_clusters
is inferior to the maximum between 100 or 0.02 * n_samples.
Otherwise, ‘auto’ is equivalent to False.
linkage ({‘ward’, ‘complete’, ‘average’, ‘single’}, default=’average’) – Which linkage criterion to use. The linkage criterion determines which
distance to use between sets of observation. The algorithm will merge
the pairs of cluster that minimize this criterion.
average uses the average of the distances of each observation of
the two sets.
complete or maximum linkage uses the maximum distances between
all observations of the two sets.
single uses the minimum of the distances between all observations
of the two sets.
ward minimizes the variance of the clusters being merged.
It works for the ‘euclidean’ distance only.
distance_threshold (float, default=None) – The linkage distance threshold above which, clusters will not be
merged. If not None, n_clusters must be None and
compute_full_tree must be True.
The children of each non-leaf node. Values less than n_samples
correspond to leaves of the tree which are the original samples.
A node i greater than or equal to n_samples is a non-leaf
node and has children children_[i - n_samples]. Alternatively
at the i-th iteration, children[i][0] and children[i][1]
are merged to form node n_samples + i.
Array of the computed inverse of a function phi
whose expression is closed-form
\(\sigma\mapsto \sigma^3 \times \frac{d}
{\mathstrut d\sigma}\log \zeta_m(\sigma)\)
where \(\sigma\) denotes the variance
and \(\zeta\) the normalization coefficient
and \(m\) the dimension.
weighted_distances (array-like, shape=[n_gaussians,]) – Mean of the weighted distances between training data
and current barycentres. The weights of each data sample
corresponds to the probability of belonging to a component
of the Gaussian mixture model.
Returns:
var (array-like, shape=[n_gaussians,]) – Estimated variances for each component of the GMM.
mixture_coefficients (array-like, shape=[n_gaussians,]) – Coefficients of the Gaussian mixture model.
mesh_data (array-like, shape=[n_precision, dim]) – Points at which the GMM probability density is computed.
Returns:
weighted_pdf (array-like, shape=[n_precision, n_gaussians,]) – Probability density function computed for each point of
the mesh data, for each component of the GMM.
A class for performing Expectation-Maximization to fit a Gaussian Mixture
Model (GMM) to data on a manifold. This method is only implemented for
the hypersphere and the Poincare ball.
Parameters:
space (Manifold) – Equipped manifold.
n_gaussians (int) – Number of Gaussian components in the mix.
initialisation_method (basestring) – Optional, default: ‘random’.
Choice between initialization method for variances, means and weights.
‘random’ : will select random uniformly train points as
initial cluster centers.
‘kmeans’ : will apply Riemannian kmeans to deduce
variances and means that the EM will use initially.
tol (float) – Optional, default: 1e-2.
Convergence tolerance. If the difference of mean distance
between two steps is lower than tol.
max_iter (int) – Maximum number of iterations for the gradient descent.
Optional, default: 100.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
weights (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for weights parameter in fit.
Frechet mean of (weighted) points using adaptive time-steps
The loss function optimized is \(||M_1(x)||_x\)
(where \(M_1(x)\) is the tangent mean at x) rather than
the mean-square-distance (MSD) because this simplifies computations.
Adaptivity is done in a Levenberg-Marquardt style weighting variable tau
between the first order and the second order Gauss-Newton gradient descent.
Parameters:
points (array-like, shape=[n_samples, *metric.shape]) – Points to be averaged.
weights (array-like, shape=[n_samples,], optional) – Weights associated to the points.
Returns:
current_mean (array-like, shape=[*metric.shape]) – Weighted Frechet mean of the points.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
weights (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for weights parameter in fit.
method (str, {'default', 'adaptive', 'batch'}) – Gradient descent method.
The adaptive method uses a Levenberg-Marquardt style adaptation of
the learning rate. The batch method is similar to the default
method but for batches of equal length of samples. In this case,
samples must be of shape [n_samples, n_batch, *space.shape].
Optional, default: 'default'.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
weights (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for weights parameter in fit.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
weights (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for weights parameter in fit.
initialization (str or array-like,) – {‘random’, ‘data’, ‘frechet’, warm_start’}
Initial values of the parameters for the optimization,
or initialization method.
Optional, default: ‘random’
regularization (float) – Weight on the constraint for the intercept to lie on the manifold in
the extrinsic optimization scheme. An L^2 constraint is applied.
Optional, default: 1.
compute_training_score (bool) – Whether to compute R^2.
Optional, default: False.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
weights (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for weights parameter in fit.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
weights (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for weights parameter in score.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
weights (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for weights parameter in fit.
Incremental frechet mean estimator calculates sample frechet mean by
moving iteratively along the geodesic between current mean estimate
and next point.
\[\text{Initialization}: m_{1} := X_{1}\]
\[\text{Update}: \text{Let } \gamma_k \text{ be geodesic joining }
m_{k-1}\text{ and } X_{k} \text{ then }
m_{k} := \gamma(1/k) \,\, \forall 2 \leq k \leq N\]
Asymptotic convergence to population frechet mean is guranteed for
simply connected, complete and non-positively curved Riemannian manifolds.
It is important to note that estimator obtained by such iterative fashion
need not necessarily be solution to the following optimization problem.
\[\max_{q \in M} \sum_{i=1}^{N} d(q, X_{i})^2\]
where d is the riemannian metric. Also, Estimator is not permutation
invariant , i.e.,the estimate might depend on the order in which
incremental updates are performed.
clean_state (bool) – If keeping track of last iteration or clean state of estimator.
Notes
Required metric methods: geodesic.
References
[CHSV2016]
Cheng, Ho, Salehian, Vemuri.
“Recursive Computation of the Frechet Mean on Non-Positively
Curved Riemannian Manifolds with Applications”,
Riemannian Computing in Computer Vision pp 21-43, 2016.
https://link.springer.com/chapter/10.1007/978-3-319-22957-7_2
X (array-like, shape=[n_samples, {dim, [n, n]}]) – Training input samples.
y (None) – Ignored.
init (array-like, shape=[{dim, [n, n]}]) – If not None, starts mean computation from init, could be useful
when data comes in streaming setting.
Optional, default: None.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
init (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for init parameter in fit.
Class for a general Kalman filter working on Lie groups.
Given an adapted model, it provides the tools to carry out non-linear state
estimation with an error modeled on the Lie algebra. The model must provide
the functions to propagate and update a state, the observation model, and
the computation of the Jacobians.
Compute the Kalman gain given the observation model.
Given the observation Jacobian H and covariance N (not necessarily
equal to that of the sensor), and the current covariance P, the Kalman
gain is K = P H^T(H P H^T + N)^{-1}.
The state is updated by the matrix-vector product of the Kalman gain K
and the innovation. The possibly non-linear update function is provided
by the model.
Given the observation Jacobian H and covariance N, the current
covariance P is updated as (I - KH)P.
Class for modeling a non-linear 2D localization problem.
The state is composed of a planar orientation and position, and is thus a
member of SE(2).
A sensor provides the linear and angular speed, while another one provides
sparse position observations.
Construct the matrix associated to the adjoint representation.
The inner automorphism is given by \(Ad_X : g |-> XgX^-1\). For a
state \(X = (\theta, x, y)\), the matrix associated to its tangent
map, the adjoint representation, is
\(\begin{bmatrix} 1 & \\ -J [x, y] & R(\theta) \end{bmatrix}\),
where \(R(\theta)\) is the rotation matrix of angle theta, and
\(J = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}\)
Parameters:
state (array-like, shape=[dim]) – Vector representing a state.
Returns:
adjoint (array-like, shape=[dim, dim]) – Adjoint representation of the state.
For an observation y and an orientation theta, the modified observation
considered for the innovation is \(R(\theta)^T y\) [BB2017], so the
covariance N is rotated accordingly as \(R(\theta)^T N R(\theta)\).
Parameters:
state (array-like, shape=[dim]) – Vector representing a state.
observation_cov (array-like, shape=[dim_obs, dim_obs]) – Covariance matrix associated to the sensor.
Returns:
covariance (array-like, shape=[dim_obs, dim_obs]) – Covariance of the observation.
Propagate state with constant velocity motion model on SE(2).
From a given state (orientation, position) pair \((\theta, x)\),
a new one is obtained as \((\theta + dt * \omega,
x + dt * R(\theta) u)\), where the time step, the linear and angular
velocities u and :math:omega are given some sensor (e.g., odometers).
Parameters:
state (array-like, shape=[dim]) – Vector representing a state (orientation, position).
sensor_input (array-like, shape=[4]) – Vector representing the information from the sensor.
Returns:
new_state (array-like, shape=[dim]) – Vector representing the propagated state.
Class for modeling a linear 1D localization problem.
The state is made of a scalar position and scalar speed, thus a 2D vector.
A sensor provides acceleration inputs, while another one provides sparse
measurements of the position.
Propagate with piece-wise constant acceleration and velocity.
Takes a given (position, speed) pair \((x, v)\) and creates a new
one \((x + dt * v, v + dt * acc)\), where the time step and the
acceleration are given by an accelerometer.
Parameters:
state (array-like, shape=[dim]) – Vector representing a state (position, speed).
sensor_input (array-like, shape=[2]) – Vector representing the information from the accelerometer.
Returns:
new_state (array-like, shape=[dim]) – Vector representing the propagated state.
Classifier implementing the kernel density estimation on manifolds.
The kernel density estimation classifier classifies the data according to
a kernel density estimation of each dataset on the manifold. The density
estimation is performed using radial kernel functions: the distance
is the only geometrical tool used to estimate the density on the manifold.
This classifier inherits from the radius neighbors classifier of the
scikit-learn library, we expect the classifier presented here to be easier
to use on manifolds.
Compared with the radius neighbors classifier, we force the
parameter ‘algorithm’ to be equal to ‘brute’ in order to
be compatible with any metric.
We also changed some default values of the scikit-learn algorithm in order
to take into account every point of the dataset during the kernel density
estimation, i.e. the default value of the parameter ‘radius’ is set to
infinity instead of 1 and the default value of the parameter ‘weight’ is
set to ‘distance’ instead of ‘uniform’.
Our main contribution is a greater choice of kernel functions,
see the radial_kernel_functions.py file in the learning directory.
The radial kernel functions are now easier to define by a user:
the input data should be an array of distances instead of an array of
arrays. Moreover the new parameter ‘bandwidth’ of our classifier can be
used to adapt the kernel function to the size of the dataset.
The scikit-learn library also provides a kernel density estimation tool
(see sklearn.neighbors.KernelDensity), however this algorithm is not built
as a classifier and is not available with all metrics.
Parameters:
space (Manifold) – Equipped manifold.
radius (float, optional (default = inf)) – Range of parameter space to use by default.
kernel (string or callable, optional (default = ‘distance’)) – Kernel function used in prediction. Possible values:
‘distance’ : weight points by the inverse of their distance.
In this case, closer neighbors of a query point will have a
greater influence than neighbors which are further away.
‘uniform’ : uniform weights. All points in each neighborhood
are weighted equally.
[callable] : a user-defined function which accepts an
array of distances, and returns an array of the same shape
containing the weights.
bandwidth (float, optional (default = 1.0)) – Bandwidth parameter used for the kernel. The kernel parameter is
used if and only if the kernel is a callable function.
outlier_label ({manual label, ‘most_frequent’}, optional (default = None)) – Label for outlier samples (samples with no neighbors in given radius).
manual label: str or int label (should be the same type as y)
or list of manual labels if multi-output is used.
‘most_frequent’ : assign the most frequent label of y to outliers.
None : when any outlier is detected, ValueError will be raised.
n_jobs (int or None, optional (default = None)) – The number of parallel jobs to run for neighbors search.
None means 1; -1 means using all processors.
The distance metric used. It will be same as the distance parameter
or a synonym of it, e.g. ‘euclidean’ if the distance parameter set to
‘minkowski’ and p parameter set to 2.
Additional keyword arguments for the distance function.
For most distances will be same with distance_params parameter,
but may also contain the p parameter value if the
effective_metric_ attribute is set to ‘minkowski’.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
n_clusters (int) – Number of clusters (k value of the k-means).
Optional, default: 8.
init (str or callable or array-like, shape=[n_clusters, n_features]) – How to initialize cluster centers at the beginning of the algorithm. The
choice ‘random’ will select training points as initial cluster centers
uniformly at random. The choice ‘kmeans++’ selects cluster centers
heuristically to improve the convergence rate. When providing an array
of shape (n_clusters,n_features), the cluster centers are chosen as the
rows of that array. When providing a callable, it receives as arguments
the argument X to fit() and the number of cluster centers
n_clusters and is expected to return an array as above.
Optional, default: ‘random’.
tol (float) – Convergence factor. Convergence is achieved when the difference of mean
distance between two steps is lower than tol.
Optional, default: 1e-2.
max_iter (int) – Maximum number of iterations.
Optional, default: 100
verbose (int) – If verbose > 0, information will be printed during learning.
Optional, default: 0.
Notes
Required metric methods: dist.
Example
Available example on the Poincaré Ball and Hypersphere manifolds
examples.plot_kmeans_manifolds
n_clusters (int) – Number of clusters (k value of k-medoids).
Optional, default: 8.
max_iter (int) – Maximum number of iterations.
Optional, default: 100.
init (str) – How to initialize cluster centers at the beginning of the algorithm. The
choice ‘random’ will select training points as initial cluster centers
uniformly at random.
Optional, default: ‘random’.
n_jobs (int) – Number of jobs to run in parallel. -1 means using all processors.
Optional, default: 1.
Notes
Required metric methods: dist, dist_pairwise.
Example
Available example on the Poincaré Ball and Hypersphere manifolds
examples.plot_kmedoids_manifolds
Labels data by minimizing the distance between data points
and cluster center chosen from the data points.
Minimization is performed by swapping the cluster centers and data points.
Parameters:
X (array-like, shape=[n_samples, dim]) – Training data, where n_samples is the number of samples and
dim is the number of dimensions.
Classifier implementing the k-nearest neighbors vote on manifolds.
Parameters:
space (Manifold) – Equipped manifold.
n_neighbors (int, optional (default = 5)) – Number of neighbors to use by default.
weights (string or callable, optional (default = ‘uniform’)) – Weight function used in prediction. Possible values:
‘uniform’ : uniform weights. All points in each neighborhood
are weighted equally.
‘distance’ : weight points by the inverse of their distance.
in this case, closer neighbors of a query point will have a
greater influence than neighbors which are further away.
[callable] : a user-defined function which accepts an
array of distances, and returns an array of the same shape
containing the weights.
n_jobs (int or None, optional (default = None)) – The number of parallel jobs to run for neighbors search.
None means 1; -1 means using all processors.
The distance metric used. It will be same as the distance parameter
or a synonym of it, e.g. ‘euclidean’ if the distance parameter set to
‘minkowski’ and p parameter set to 2.
Additional keyword arguments for the distance function.
For most distances will be same with distance_params parameter,
but may also contain the p parameter value if the
effective_metric_ attribute is set to ‘minkowski’.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Minimum Distance to Mean (MDM) classifier on manifolds.
Classification by nearest centroid. For each of the given classes, a
centroid is estimated according to the chosen metric. Then, for each new
point, the class is affected according to the nearest centroid [BBCJ2012].
A. Barachant, S. Bonnet, M. Congedo and C. Jutten, Multiclass
Brain-Computer Interface Classification by Riemannian Geometry. IEEE
Trans. Biomed. Eng., vol. 59, pp. 920-928, 2012.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
weights (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for weights parameter in fit.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Online k-means clustering seeks to divide a set of data points into
a specified number of classes, while minimizing intra-class variance.
It is closely linked to discrete quantization, which computes the closest
approximation of the empirical distribution of the dataset by a discrete
distribution supported by a smaller number of points with respect to the
Wasserstein distance. The algorithm used can either be seen as an online
version of the k-means algorithm or as Competitive Learning Riemannian
Quantization (see [LBP2019]).
Parameters:
space (Manifold) – Equipped manifold. At each iteration,
one of the cluster centers is moved in the direction of the new
datum, according the exponential map of the underlying space, which
is a method of metric.
n_clusters (int) – Number of clusters of the k-means clustering, or number of desired
atoms of the quantized distribution.
n_repetitions (int, default=20) – The cluster centers are updated using decreasing step sizes, each
of which stays constant for n_repetitions iterations to allow a better
exploration of the data points.
max_iter (int, default=5e4) – Maximum number of iterations. If it is reached, the
quantization may be inacurate.
A. Le Brigant and S. Puechmorel, Optimal Riemannian
quantization with an application to air traffic analysis. J. Multivar.
Anal. 173 (2019), 685 - 703.
Perform online version of k-means algorithm on data contained in X.
The data points are treated sequentially and the cluster centers are
updated one at a time. This version of k-means avoids computing the
mean of each cluster at each iteration and is therefore less
computationally intensive than the offline version.
In the setting of quantization of probability distributions, this
algorithm is also known as Competitive Learning Riemannian Quantization.
It computes the closest approximation of the empirical distribution of
data by a discrete distribution supported by a smaller number of points
with respect to the Wasserstein distance. This smaller number of points
is n_clusters.
Parameters:
X (array-like, shape=[n_samples, n_features]) – Input data. It is treated sequentially by the algorithm, i.e.
one datum is chosen randomly at each iteration.
Exact Principal Geodesic Analysis in the hyperbolic plane.
The first principal component is computed by finding the direction
in a unit ball around the mean that maximizes the variance of the
projections on the induced geodesic. The projections are given by
closed form expressions in extrinsic coordinates. The second principal
component is the direction at the mean that is orthogonal to the first
principal component.
Parameters:
space (Hyperbolic) – Two-dimensional hyperbolic space.
n_vec (int) – Number of vectors used to discretize the unit ball when finding
the direction of maximal variance.
R. Chakraborty, D. Seo, and B. C. Vemuri,
“An efficient exact-pga algorithm for constant curvature manifolds.”
Proceedings of the IEEE conference on computer vision and pattern
recognition. 2016.
X (array-like, shape=[…, n_features]) – Training data in the hyperbolic plane. If the space is
the Poincare half-space or Poincare ball, n_features is
2. If it is the hyperboloid, n_features is 3.
y (Ignored (Compliance with scikit-learn interface))
X (array-like, shape=[n_points, 2]) – Training data in the hyperbolic plane. If the space is
the Poincare half-space or Poincare ball, n_features is
2. If it is the hyperboloid, n_features is 3.
y (Ignored (Compliance with scikit-learn interface))
Returns:
X_new (array-like, shape=[n_components, n_points, 2]) – Projections of the data on the first principal geodesic (first line
of the array) and on the second principal geodesic (second line).
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
base_point (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for base_point parameter in fit.
Compute the logs of all data points and reshape them to
1d vectors if necessary. This means that all the data points, that belong
to a possibly non-linear manifold are lifted to one of the tangent space of
the manifold, which is a vector space. By default, the mean of the data
is computed (with the FrechetMean or the ExponentialBarycenter estimator,
as appropriate) and the tangent space at the mean is used. Any other base
point can be passed. The data points are then represented by the initial
velocities of the geodesics that lead from base_point to each data point.
Any machine learning algorithm can then be used with the output array.
Parameters:
space (Manifold) – Equipped manifold or unequipped space implementing exp and log.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
base_point (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for base_point parameter in fit.
weights (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for weights parameter in fit.
Compute the logs of all data point and reshapes them to
1d vectors if necessary. By default the logs are taken at the mean
but any other base point can be passed. Any machine learning
algorithm can then be used with the output array.
Parameters:
X (array-like, shape=[n_samples, {dim, [n, n]}]) – Data to transform.
We chose not to apply the normalization coefficients used in some references
in order that the kernel functions integrate to 1 on the Euclidean space of
dimension 1.
Class for Riemannian Mean Shift algorithm on manifolds.
Mean Shift is a procedure for locating the maxima - the modes of a
density function given discrete data sampled from that function. It is
an iterative method for finding the centers of a collection of clusters.
Following implementation assumes a flat kernel method.
Parameters:
space (Manifold) – Equipped manifold.
bandwidth (float) – Size of neighbourhood around each center. All points in ‘bandwidth’
size around center are considered for calculating new mean centers.
tol (float) – Stopping condition. Computation of subsequent mean centers is stopped
when the distance between them is less than ‘tol’.
Optional, default : 1e-2.
n_clusters (int) – Number of centers.
Optional, default : 1.
n_jobs (int) – Number of parallel threads to be initiated for parallel jobs.
Optional, default : 1.
max_iter (int) – Upper bound on total number of iterations for the centers to converge.
Optional, default : 100.
init_centers (str) – Initializing centers, either from the given input points or
random points uniformly distributed in the input manifold.
Optional, default : “from_points”.
kernel (str) – Weighing function to assign kernel weights to each center.
Optional, default : “flat”.
Mallasto, A. and Feragen, A.
“Wrapped gaussian process regression on riemannian manifolds.”
IEEE/CVF Conference on Computer Vision and Pattern Recognition
(2018)
Predict using the Gaussian process regression model.
A fitted Wrapped Gaussian process can be use to predict values
through the following steps:
Use the stored Gaussian process regression on the dataset to
return tangent predictions
Compute the base-points using the prior
Map the tangent predictions on the manifold via the metric’s exp
with the base-points yielded by the prior
We can also predict based on an unfitted model by using the GP prior.
In addition to the mean of the predictive distribution, optionally also
returns its standard deviation (return_std=True) or covariance
(return_cov=True). Note that at most one of the two can be requested.
Parameters:
X (array-like of shape (n_samples, n_features) or list of object) – Query points where the GP is evaluated.
return_tangent_std (bool, default=False) – If True, the standard-deviation of the predictive distribution on at
the query points in the tangent space is returned along with the mean.
return_tangent_cov (bool, default=False) – If True, the covariance of the joint predictive distribution at
the query points in the tangent space is returned along with the mean.
Returns:
y_mean (ndarray of shape (n_samples,) or (n_samples, n_targets)) – Mean of predictive distribution a query points.
y_std (ndarray of shape (n_samples,) or (n_samples, n_targets), optional) – Standard deviation of predictive distribution at query points in
the tangent space.
Only returned when return_std is True.
y_cov (ndarray of shape (n_samples, n_samples) or (n_samples, n_samples, n_targets), optional) – Covariance of joint predictive distribution a query points
in the tangent space.
Only returned when return_cov is True.
In the case where the target is matrix valued,
return the covariance of the vectorized prediction.
Draw samples from Wrapped Gaussian process and evaluate at X.
A fitted Wrapped Gaussian process can be use to sample
values through the following steps:
Use the stored Gaussian process regression on the dataset
to sample tangent values
Compute the base-points using the prior
Flatten (and repeat if needed) both the base-points and the
tangent samples to benefit from vectorized computation.
Map the tangent samples on the manifold via the metric’s exp with the
flattened and repeated base-points yielded by the prior
Parameters:
X (array-like of shape (n_samples_X, n_features) or list of object) – Query points where the WGP is evaluated.
n_samples (int, default=1) – Number of samples drawn from the Wrapped Gaussian process per query
point.
random_state (int, RandomState instance or None, default=0) – Determines random number generation to randomly draw samples.
Pass an int for reproducible results across multiple function
calls.
Returns:
y_samples (ndarray of shape (n_samples_X, n_samples), or (n_samples_X, *target_shape, n_samples)) – Values of n_samples samples drawn from wrapped Gaussian process and
evaluated at query points.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
return_tangent_cov (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_tangent_cov parameter in predict.
return_tangent_std (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_tangent_std parameter in predict.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.