Datasets¶
Prepare EMG data¶
Preprocess time series into batched covariance matrices.
The user defines the number of time steps of the batches. It starts by removing the transient signal by taking a margin on each side of the sign change. It then creates batches of data that will be used to build the covariance matrices. In practice, one needs to choose the size of the batches big enough to get enough information, and small enough so that the online classifier is reactive enough.

class
geomstats.datasets.prepare_emg_data.
TimeSeriesCovariance
(data, n_steps, n_timeseries, label_map, margin=0)[source]¶ Class for generating a list of covariance matrices from time series.
Prepare a TimeSeriesCovariance Object from time series in dictionary.
 Parameters
data_dict (dict) – Dictionary with ‘time’, ‘raw_data’, ‘label’ as key and the corresponding array as values.
n_steps (int) – Size of the batches.
n_timeseries (int) – The number of electrodes used for the recording.
label_map (dictionary) – Encode the label into digits.
margin (int) – Number of index to remove before and after a sign change (Can help getting a stationary signal).

label_map
¶ Encode the label into digits.
 Type
dictionary

data_dict
¶ Dictionary with ‘time’, ‘raw_data’, ‘label’ as key and the corresponding array as values.
 Type
dict

n_steps
¶ Size of the batches.
 Type
int

n_timeseries
¶ The number of electrodes used for the recording.
 Type
int

batches
¶ The start indexes of the batches to use to compute covariance matrices.
 Type
array

margin
¶ Number of index to remove before and after a sign change (Can help getting a stationary signal).
 Type
int

covs
¶ The covariance matrices.
 Type
array

labels
¶ The digit labels corresponding to each batch.
 Type
array

covec
¶ The vectorized version of the covariance matrices.
 Type
array

diags
¶ The covariance matrices diagonals.
 Type
array
Prepare Graph Data¶
Prepare and process graphstructured data.

class
geomstats.datasets.prepare_graph_data.
Graph
(graph_matrix_path, labels_path)[source]¶ Class for generating a graph object from a dataset.
Prepare Graph object from a dataset file.
 Parameters
graph_matrix_path (string) – Path to graph adjacency matrix.
labels_path (string) – Path to labels of the nodes of the graph.

edges
¶ Dictionary with node number as key and edge connected node numbers as values.
 Type
dict

n_nodes
¶ Number of nodes in the graph.
 Type
int

labels
¶ Dictionary with node number as key and the true label number as values.
 Type
dict

random_walk
(walk_length=5, n_walks_per_node=1)[source]¶ Compute a set of random walks on a graph.
For each node of the graph, generates a a number of random walks of a specified length. Two consecutive nodes in the random walk, are necessarily related with an edge. The walks capture the structure of the graph.
 Parameters
walk_length (int) – Length of a random walk in terms of number of edges.
n_walks_per_node (int) – Number of generated walks starting from each node of the graph.
 Returns
self (arraylike,) – Shape=[n_walks_per_node*self.n_edges), walk_length] array containing random walks.

class
geomstats.datasets.prepare_graph_data.
HyperbolicEmbedding
(dim=2, max_epochs=100, lr=0.05, n_context=1, n_negative=2)[source]¶ Class for learning embeddings of graphs on hyperbolic space.
 Parameters
dim (object) – Dimensions of the used hyperbolic space.
max_epochs (int) – Maximum number of iterations for embedding.
lr (int) – Learning rate for embedding.
n_context (int) – Number of nodes to consider from a neighborhood of nodes around a particular node.
n_negative (int) – Number of nodes to consider when searching for a set of nodes that are far from a particular node.

embed
(graph)[source]¶ Compute embedding.
Optimize a loss function to obtain a representable embedding.
 Parameters
graph (object) – An instance of the Graph class.
 Returns
embeddings (arraylike, shape=[n_samples, dim]) – Return the embedding of the data. Each data sample is represented as a point belonging to the manifold.

static
grad_log_sigmoid
(vector)[source]¶ Gradient of log sigmoid function.
 Parameters
vector (arraylike, shape=[n_samples, dim])
 Returns
gradient (arraylike, shape=[n_samples, dim])

grad_squared_distance
(point_a, point_b)[source]¶ Gradient of squared hyperbolic distance.
Gradient of the squared distance based on the Ball representation according to point_a.
 Parameters
point_a (arraylike, shape=[n_samples, dim]) – First point in hyperbolic space.
point_b (arraylike, shape=[n_samples, dim]) – Second point in hyperbolic space.
 Returns
dist (arraylike, shape=[n_samples, 1]) – Geodesic squared distance between the two points.

static
log_sigmoid
(vector)[source]¶ Logsigmoid function.
Apply log sigmoid function.
 Parameters
vector (arraylike, shape=[n_samples, dim])
 Returns
result (arraylike, shape=[n_samples, dim])

loss
(example_embedding, context_embedding, negative_embedding)[source]¶ Compute loss and grad.
Compute loss and grad given embedding of the current example, embedding of the context and negative sampling embedding.
 Parameters
example_embedding (arraylike, shape=[dim]) – Current data sample embedding.
context_embedding (arraylike, shape=[dim]) – Current context embedding.
negative_embedding (arraylike, shape=[dim]) – Current negative sample embedding.
 Returns
total_loss (int) – The current value of the loss function.
example_grad (arraylike, shape=[dim]) – The gradient of the loss function at the embedding of the current data sample.
Utils¶
Loading toy datasets.
Refer to notebook: geomstats/notebooks/01_data_on_manifolds.ipynb to visualize these datasets.

geomstats.datasets.utils.
load_cities
()[source]¶ Load data from data/cities/cities.json.
 Returns
data (arraylike, shape=[50, 2]) – Array with each row representing one sample, i. e. latitude and longitude of a city. Angles are in radians.
name (list) – List of city names.

geomstats.datasets.utils.
load_connectomes
(as_vectors=False)[source]¶ Load data from brain connectomes.
Load the correlation data from the kaggle MSLP 2014 Schizophrenia Challenge. The original data came as flattened vectors, but if raw=True is passed, the correlation values are reshaped as symmetric matrices with ones on the diagonal.
 Parameters
as_vectors (bool) – Whether to return raw data as vectors or as symmetric matrices. Optional, default: False
 Returns
mat (arraylike, shape=[86, {[28, 28], 378}) – Connectomes.
patient_id (arraylike, shape=[86,]) – Patient unique identifiers
target (arraylike, shape=[86,]) – Labels, whether patients belong to the diseased class (1) or control (0).

geomstats.datasets.utils.
load_emg
()[source]¶ Load data from data/emg/emg.csv.
 Returns
data_emg (pandas.DataFrame, shape=[731682, 10]) – Emg time serie for each of the 8 electrodes, with the time stamps and the label of the hand sign.

geomstats.datasets.utils.
load_hands
()[source]¶ Load data from data/hands/hands.txt and labels.txt.
Load the dataset of hand poses, where a hand is represented as a set of 22 landmarks  the hands joints  in 3D.
The hand poses represent two different hand poses:  Label 0: hand is in the position “Grab”  Label 1: hand is in the position “Expand”
This is a subset of the SHREC 2017 dataset [SWVGLF2017].
References
 SWVGLF2017
De Smedt, H. Wannous, J.P. Vandeborre,
J. Guerry, B. Le Saux, D. Filliat, SHREC’17 Track: 3D Hand Gesture Recognition Using a Depth and Skeletal Dataset, 10th Eurographics Workshop on 3D Object Retrieval, 2017. https://doi.org/10.2312/3dor.20171049
 Returns
data (arraylike, shape=[52, 22, 3]) – Hand data, represented as a list of 22 joints, specifically as the 3D coordinates of these joints.
labels (arraylike, shape=[52,]) – Label representing hands poses. Label 0: “Grab”, Label 1: “Expand”
bone_list (arraylike) – List of bones, as a list of connexions between joints.

geomstats.datasets.utils.
load_karate_graph
()[source]¶ Load data from data/graph_karate.
 Returns
graph (prepare_graph_data.Graph) – Graph containing nodes, edges, and labels from the karate dataset.

geomstats.datasets.utils.
load_leaves
()[source]¶ Load data from data/leaves/leaves.xlsx.
 Returns
beta_param (arraylike, shape=[172, 2]) – Beta parameters of the beta distributions fitted to each leaf orientation angle sample of 172 species of plants.
distrib_type (arraylike, shape=[172, ]) – Leaf orientation angle distribution type for each of the 172 species.

geomstats.datasets.utils.
load_optical_nerves
()[source]¶ Load data from data/optical_nerves/optical_nerves.txt.
Load the dataset of sets of 5 landmarks, labelled S, T, I, N, V, in 3D on monkeys’ optical nerve heads:  1st landmark (S): superior aspect of the retina,  2nd landmark (T): side of the retina closest to the temporal
bone of the skull,
3rd landmark (N): nose side of the retina,
4th landmark (I): inferior point,
5th landmarks (V): optical nerve head deepest point.
For each monkey, an experimental glaucoma was introduced in one eye, while the second eye was kept as control. This dataset can be used to investigate a significant difference between the glaucoma and the control eyes.
Label 0 refers to a normal eye, and Label 1 to an eye with glaucoma.
References
 PE2015
V. Patrangenaru and L. Ellingson. Nonparametric Statistics on Manifolds and Their Applications to Object Data, 2015. https://doi.org/10.1201/b18969
 Returns
data (arraylike, shape=[22, 5, 3]) – Data representing the 5 landmarks, in 3D, for 11 different monkeys.
labels (arraylike, shape=[22,]) – Labels in {0, 1} classifying the corresponding optical nerve as normal (label = 0) or glaucoma (label = 1).
monkeys (arraylike, shape=[22,]) – Indices in 0…10 referencing the index of the monkey to which a given optical nerve belongs.

geomstats.datasets.utils.
load_poses
(only_rotations=True)[source]¶ Load data from data/poses/poses.csv.
 Returns
data (arraylike, shape=[5, 3] or shape=[5, 6]) – Array with each row representing one sample, i. e. one 3D rotation or one 3D rotation + 3D translation.
img_paths (list) – List of img paths.