Notebook source code: notebooks/06_information_geometry.ipynb
Run it yourself on binder Binder badge

Tutorial: Information geometry

Disclaimer: this notebook requires the use of the numpy backend.


Information geometry is a branch of mathematics at the crossroad of statistics and differential geometry, focused on the study of probability distributions from a geometric point of view. Although it has been extended to the non parametric case, this framework is foremost concerned with parameteric families of probability distributions. The space of parameters, an open subset of the Euclidean space, is seen as a differential manifold and the Fisher information matrix is used to define a Riemannian metric on it, called the Fisher-Rao metric or Fisher information metric. This metric is invariant under change of parameterization, which makes it a natural choice. Moreover it is the only Riemannian metric compatible with the notion of information contained by the model on the parameter, in the sense that it is the only metric that preserves the geometry of a parametric model after transformation by a sufficient statistic (Cencov’s theorem). For an overview, see [A2016].


Before starting this tutorial, we set the working directory to be the root of the geomstats repository. In order to have the code working on your machine, you need to change this path to the path of your geomstats repository.

 In [1]:
import os
import subprocess

geomstats_gitroot_path = subprocess.check_output(
    ['git', 'rev-parse', '--show-toplevel'],


print('Working directory: ', os.getcwd())
Working directory:  /Users/alicelebrigant/ownCloud/Python/SpyderProjects/geomstats
 In [2]:
import matplotlib
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D

import geomstats.backend as gs
import geomstats.visualization as visualization
INFO: Using numpy backend

Normal distributions

The Fisher-Rao geometry of the family of normal distributions is arguably the simplest example of two-dimensional information geometry. The space of parameters is the upper half-plane where the x-coordinate encodes the mean and the y-coordinate the standard deviation. Quite remarkably, the Fisher-Rao metric induces the hyperbolic geometry of the Poincare half plane [AM1981]. To start, we need an instance of the class NormalDistributions and its Fisher-Rao metric.

 In [3]:
from geomstats.geometry.normal_distributions import NormalDistributions

normal = NormalDistributions()
fisher_metric = normal.metric

Using the visualization module, we can plot the geodesic between two points, each defining the parameters (mean and standard deviation) for a normal distribution. We recognise the shape of a geodesic of the Poincare half-plane, namely a half-circle orthogonal to the x-axis.

 In [4]:
point_a = gs.array([1., 1.])
point_b = gs.array([3., 1.])

geodesic_ab_fisher = fisher_metric.geodesic(point_a, point_b)

n_points = 20
t = gs.linspace(0, 1, n_points)
 In [5]:
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(111)
cc = gs.zeros((n_points, 3))
cc[:, 2] = gs.linspace(0, 1, n_points)

    geodesic_ab_fisher(t), ax=ax, space='H2_poincare_half_plane', label='point on geodesic', color=cc)

ax.set_xlim(0., 4.)
ax.set_ylim(0., 2.)
ax.set_title('Geodesic between two normal distributions for the Fisher-Rao metric')

Each point of the geodesic defines a normal distribution, and so we obtain an optimal interpolation between the distributions corresponding to point_a and point_b, which we can visualize in terms of probability density functions.

 In [6]:
pdfs = normal.point_to_pdf(geodesic_ab_fisher(t))
x = gs.linspace(-3., 7., 100)

fig = plt.figure(figsize=(10, 5))
for i in range(n_points):
    plt.plot(x, pdfs(x)[:, i], color=cc[i, :])
plt.title('Corresponding interpolation between pdfs');

Another possibility to compare probability distributions is given by the \(L^2\)-Wasserstein metric, central in optimal transport. In the case of normal distributions, the \(L^2\)-Wasserstein metric induces the Euclidean geometry on the upper half plane [BGKL2017]. Therefore, the Wasserstein distance between two normal distributions with different means and same variance (point_a and point_b) will not change when this common variance is increased (point_c and point_d), while the corresponding Fisher-Rao distance will decrease, as can be deduced from the shape of the geodesic. This can be interpreted as a consequence of the increasing overlap of the corresponding probability densities.

 In [7]:
from geomstats.geometry.euclidean import Euclidean

plane = Euclidean(2)
wasserstein_metric = plane.metric

point_c = gs.array([1., 3.])
point_d = gs.array([3., 3.])

geodesic_cd_fisher = fisher_metric.geodesic(point_c, point_d)
geodesic_ab_wasserstein = wasserstein_metric.geodesic(point_a, point_b)
geodesic_cd_wasserstein = wasserstein_metric.geodesic(point_c, point_d)

points = gs.stack((point_a, point_b, point_c, point_d))
pdfs = normal.point_to_pdf(points)
 In [8]:
%matplotlib inline

fig = plt.figure(figsize=(12, 5))
ax1 = fig.add_subplot(121)

    gs.vstack((geodesic_ab_fisher(t), geodesic_cd_fisher(t))),
    ax=ax1, space='H2_poincare_half_plane', label='Fisher-Rao geodesic',
    gs.vstack((geodesic_ab_wasserstein(t), geodesic_cd_wasserstein(t))),
    ax=ax1, space='H2_poincare_half_plane', label='Wasserstein geodesic',
    color='black', alpha=0.5)
    gs.stack((point_a, point_b)), ax=ax1, space='H2_poincare_half_plane',
    label='points a and b', s=100)
    gs.stack((point_c, point_d)), ax=ax1, space='H2_poincare_half_plane',
    label='points c and d', s=100)

ax1.set_xlim(0., 4.)
ax1.set_ylim(0., 4.)

ax2 = fig.add_subplot(122)
x = gs.linspace(-3., 7., 100)
lines = [Line2D([0], [0], color='C0'),
         Line2D([0], [0], color='C1')]
ax2.plot(x, pdfs(x)[:, :2], c='C0')
ax2.plot(x, pdfs(x)[:, 2:], c='C1')
ax2.legend(lines, ['pdfs a and b', 'pdfs c and d']);

Beta distributions

Let us now consider the example of beta distributions, where the space of parameters is the first quadrant. In this case, the geodesics for the Fisher-Rao metric do not have a closed form, but can be found numerically [LGRP2020]. Here we plot an example of geodesic ball.

 In [3]:
from geomstats.geometry.beta_distributions import BetaDistributions

beta = BetaDistributions()
 In [10]:
n_rays = 50
center = gs.array([2., 2.])
theta = gs.linspace(-gs.pi, gs.pi, n_rays)
directions = gs.transpose(
    gs.stack((gs.cos(theta), gs.sin(theta))))

fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(111)
ray_length = 0.25
direction_norms = beta.metric.squared_norm(directions, center)**(1/2)
unit_vectors = directions/gs.expand_dims(direction_norms, 1)
initial_vectors = ray_length * unit_vectors

n_points = 10
t = gs.linspace(0., 1., n_points)
for j in range(n_rays):
    geod = beta.metric.geodesic(
        initial_point=center, initial_tangent_vec=initial_vectors[j, :])
    ax.plot(*gs.transpose(gs.array([geod(k) for k in t])))
ax.set_xlim(1, 3)
ax.set_ylim(1, 3)
ax.set_title('Geodesic ball of the space of beta distributions');

Now we consider the dataset leaves, which contains parameters of beta distributions fitted to inclination angles of leaves of different plant species (A dataset of leaf inclination angles for temperate and boreal broadleaf woody species, These species are divided into 5 categories according to inclination angle distribution type: spherical, erectophile, uniform, planophile and plagiophile. For each species, a beta distribution is fitted to the set of inclination angle measurements. Each species can therefore be represented by a point in the manifold of beta distributions.

 In [4]:
import geomstats.datasets.utils as data_utils

beta_param, distrib_type = data_utils.load_leaves()
 In [5]:
fig = plt.figure(figsize=(5, 5))
for distrib in set(distrib_type):
    points = beta_param[distrib_type==distrib, :]
    plt.plot(points[:, 0], points[:, 1], 'o', label=distrib)
plt.title('Beta parameters of the leaf inclination angle distributions of 172 different species')

Using the FrechetMean learning class, we can compute the leaf inclination angle mean distribution among the species of type ‘planophile’.

 In [6]:
from geomstats.learning.frechet_mean import FrechetMean

points_plan = beta_param[distrib_type=='planophile', :]

mean = FrechetMean(metric=beta.metric)

mean_estimate = mean.estimate_
 In [7]:
fig = plt.figure(figsize=(5, 5))
plt.plot(points_plan[:, 0], points_plan[:, 1], 'o', label='planophile')
plt.plot(*mean_estimate, 'o', markersize=10, label='mean planophile')
plt.title('Beta parameters of the leaf inclination angle mean distribution '
          'of species of planophile type')


  1. Amari. Information geometry and its applications. Vol. 194. Springer, 2016.

  1. Atkinson and A. FS Mitchell. Rao’s distance measure. Sankhya: The Indian Journal of Statistics. Series A, pp. 345–365, 1981.

  1. Bigot, R. Gouet, T. Klein and A. López. Geodesic PCA in the Wasserstein space by convex PCA. In Annales de l’Institut Henri Poincaré, Probabilités et Statistiques. Vol. 53. No. 1. Institut Henri Poincaré, 2017.

  1. Le Brigant, N. Guigui, S. Rebbah and S. Puechmorel, Classifying histograms of medical data using information geometry of beta distributions. arXiv preprint arXiv:2006.04511, 2020.