{ "cells": [ { "cell_type": "markdown", "id": "87d91a42", "metadata": {}, "source": [ "# Align all and Compute for Graphs\n", "\n", "$\\textbf{Lead Author: Anna Calissano}$" ] }, { "cell_type": "markdown", "id": "0e4098a0", "metadata": {}, "source": [ "Dear learner, \n", "\n", "the aim of the current notebook is to introduce the align all and compute as a learning method for graphs. The align all and compute allows to estimate the Frechet Mean, the Generalized Geodesic Principal Components and the Regression. In this notebook you will learn how use all the learning methods." ] }, { "cell_type": "code", "execution_count": 1, "id": "319481d6", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO: Using numpy backend\n" ] } ], "source": [ "import random\n", "\n", "import networkx as nx\n", "\n", "import geomstats.backend as gs\n", "from geomstats.geometry.stratified.graph_space import GraphSpace\n", "from geomstats.learning.aac import AAC\n", "\n", "gs.random.seed(2020)" ] }, { "cell_type": "markdown", "id": "e476384d", "metadata": {}, "source": [ "Let's start by creating simulated data using `networkx`." ] }, { "cell_type": "code", "execution_count": 2, "id": "5d2c46cb", "metadata": { "tags": [ "nbsphinx-thumbnail" ] }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "graphset_1 = gs.array(\n", " [\n", " nx.to_numpy_array(nx.erdos_renyi_graph(n=5, p=0.6, directed=True))\n", " for i in range(10)\n", " ]\n", ")\n", "graphset_2 = gs.array(\n", " [\n", " nx.to_numpy_array(nx.erdos_renyi_graph(n=5, p=0.6, directed=True))\n", " for i in range(100)\n", " ]\n", ")\n", "graphset_3 = gs.array(\n", " [\n", " nx.to_numpy_array(nx.erdos_renyi_graph(n=3, p=0.6, directed=True))\n", " for i in range(1000)\n", " ]\n", ")\n", "\n", "nx.draw(nx.from_numpy_array(graphset_1[0]))" ] }, { "cell_type": "markdown", "id": "d766dd07", "metadata": {}, "source": [ "### A primer in space, metric and aligners" ] }, { "cell_type": "markdown", "id": "e0e65cec", "metadata": {}, "source": [ "The first step is to create the total space and then add quotient structure to it." ] }, { "cell_type": "code", "execution_count": 3, "id": "bade9271", "metadata": {}, "outputs": [], "source": [ "total_space = GraphSpace(n_nodes=5)\n", "total_space.equip_with_group_action() # permutations by default\n", "\n", "graph_space = total_space.equip_with_quotient()" ] }, { "cell_type": "markdown", "id": "36336c04", "metadata": {}, "source": [ "By default, the total space comes equipped with the Frobenius metric (`MatricesMetric`) and graph space with a quotient metric." ] }, { "cell_type": "markdown", "id": "a8656a77", "metadata": {}, "source": [ "With the FAQ alignment and the default Frobenius norm on the total space, we match two graphs and a set of graphs to a base graph:" ] }, { "cell_type": "code", "execution_count": 4, "id": "4074d7e9", "metadata": {}, "outputs": [], "source": [ "permutated_graph = total_space.aligner.align(graphset_1[1], graphset_1[0])\n", "\n", "permuted_graphs = total_space.aligner.align(graphset_1[1:3], graphset_1[0])" ] }, { "cell_type": "markdown", "id": "5d7bfb1e", "metadata": {}, "source": [ "To compute the distance we can either call the distance function:" ] }, { "cell_type": "code", "execution_count": 5, "id": "abe70991", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.6457513110645907" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "graph_space.metric.dist(graphset_1[0], graphset_1[1])" ] }, { "cell_type": "markdown", "id": "440de4b1", "metadata": {}, "source": [ "Or, if matching has been already done, we can use the total space distance, to avoid computing the matching twice:" ] }, { "cell_type": "code", "execution_count": 6, "id": "8fb68953", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.6457513110645907" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "total_space.metric.dist(graphset_1[0], permutated_graph)" ] }, { "cell_type": "markdown", "id": "aefc53f6", "metadata": {}, "source": [ "We can also align points to geodesics:" ] }, { "cell_type": "code", "execution_count": 7, "id": "488ff262", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "init_point, end_point = graph_space.random_point(2)\n", "\n", "geodesic_func = graph_space.metric.geodesic(init_point, end_point)\n", "\n", "aligned_init_point = total_space.aligner.align_point_to_geodesic(\n", " geodesic_func, init_point\n", ")\n", "\n", "total_space.metric.dist(init_point, aligned_init_point)" ] }, { "cell_type": "markdown", "id": "18fe3623", "metadata": {}, "source": [ "This short introduction should be enough to set you up for experimenting with the learning algorithms on graphs." ] }, { "cell_type": "markdown", "id": "7dc0c7a5", "metadata": {}, "source": [ "### Frechet Mean\n", "Reference: Calissano, A., Feragen, A., & Vantini, S. (2020). Populations of unlabeled networks: Graph space geometry and geodesic principal components. MOX Report.\n", "\n", "Given $\\{[X_1], \\dots, [X_k]\\}, [x_i] \\in X/T$, we estimate the Frechet Mean using AAC consisting on two steps:\n", "1. Compute $\\hat{X}$ as arithmetic mean of $\\{X_1, \\dots, X_k\\}, X_i \\in X$ \n", "2. Using graph to graph alignment to find $\\{X_1, \\dots, X_k\\}, X_i \\in X$ optimally aligned with $\\hat{X}$" ] }, { "cell_type": "markdown", "id": "cfbe01a3", "metadata": {}, "source": [ "Let's instantiate the graph space." ] }, { "cell_type": "code", "execution_count": 8, "id": "e1b4b3d1", "metadata": {}, "outputs": [], "source": [ "total_space = GraphSpace(n_nodes=5)\n", "total_space.equip_with_group_action()\n", "total_space.equip_with_quotient();" ] }, { "cell_type": "markdown", "id": "fa8ede70", "metadata": {}, "source": [ "And now create the estimator, and fit the data." ] }, { "cell_type": "code", "execution_count": 9, "id": "5073938a", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING: Maximum number of iterations 20 reached. The estimate may be inaccurate\n" ] }, { "data": { "text/plain": [ "array([[0. , 0.9 , 0.37, 0.9 , 0.59],\n", " [0.61, 0. , 0.25, 0.83, 0.66],\n", " [0.29, 0.79, 0. , 0.44, 0.49],\n", " [0.19, 0.61, 0.23, 0. , 0.29],\n", " [0.87, 0.92, 0.73, 0.86, 0. ]])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aac_fm = AAC(space=total_space, estimate=\"frechet_mean\", max_iter=20)\n", "\n", "fm = aac_fm.fit(graphset_2)\n", "\n", "fm.estimate_" ] }, { "cell_type": "markdown", "id": "83e636a5", "metadata": {}, "source": [ "### Principal Components\n", "Reference: Calissano, A., Feragen, A., & Vantini, S. (2020). Populations of unlabeled networks: Graph space geometry and geodesic principal components. MOX Report.\n", "\n", "We estimate the Generalized Geodesics Principal Components Analysis (GGPCA) using AAC. Given $\\{[X_1], \\dots, [X_k]\\}, (s_i,[X_i]) \\in X/T $ we are searching for:\n", "$\\gamma: \\mathbb{R}\\rightarrow X/T$ generalized geodesic principal component capturing the majority of the variability of the dataset. The AAC for ggpca works in two steps: \n", "\n", "1. finding $\\delta: \\mathbb{R}\\rightarrow X$ principal component in the set of adjecency matrices $\\{X_1, \\dots, X_k\\}, X_i \\in X$ \n", "2. finding $\\{X_1, \\dots, X_k\\}, X_i \\in X$ as optimally aligned with respect to $\\gamma$. The estimation required a point to geodesic aligment defined in the metric." ] }, { "cell_type": "markdown", "id": "cb886a18", "metadata": {}, "source": [ "As before:" ] }, { "cell_type": "code", "execution_count": 10, "id": "c1eae258", "metadata": {}, "outputs": [], "source": [ "total_space = GraphSpace(n_nodes=3)\n", "total_space.equip_with_group_action()\n", "total_space.equip_with_quotient();" ] }, { "cell_type": "markdown", "id": "7d2c0141", "metadata": {}, "source": [ "For GGPCA, we also need the point to geodesic aligner." ] }, { "cell_type": "markdown", "id": "9754a5aa", "metadata": {}, "source": [ "Again, create the estimator and fit the data." ] }, { "cell_type": "code", "execution_count": 11, "id": "434fef57", "metadata": { "scrolled": true }, "outputs": [], "source": [ "aac_ggpca = AAC(space=total_space, estimate=\"ggpca\", n_components=2)\n", "\n", "aac_ggpca.fit(graphset_3);" ] }, { "cell_type": "markdown", "id": "7c85724c", "metadata": {}, "source": [ "## Regression\n", "Reference: Calissano, A., Feragen, A., & Vantini, S. (2022). Graph-valued regression: Prediction of unlabelled networks in a non-Euclidean graph space. Journal of Multivariate Analysis, 190, 104950.\n", "\n", "We estimate a graph-to-value regression model to predict graph from scalar or vectors. Given $\\{(s_1,[X_1]), \\dots, (s_k, [X_k])\\}, (s_i,[X_i]) \\in \\mathbb{R}^p\\times X/T $ we are searching for:\n", "$$f: \\mathbb{R}^p\\rightarrow X/T$$\n", "where $f\\in \\mathcal{F}(X/T)$ is a generalized geodesic regression model, i.e., the canonical projection onto Graph Space of a regression line $h_\\beta : \\mathbb{R}^p\\rightarrow X$ of the form $$h_\\beta(s) = \\sum_{j=1}^{p} \\beta_i s_i$$\n", "The AAC algorithm for regression combines the estimation of $h_\\beta$ given $\\{X_1, \\dots, X_k\\}, X_i \\in X$\n", "$$\\sum_{i=0}^{k} d_X(h_\\beta(s_i), X_i)$$\n", "and the searching for $\\{X_1, \\dots, X_k\\}, X_i \\in X$ optimally aligned with respect to the prediction along the current regression model:\n", "$$\\min_{t\\in T}d_X(h_\\beta(s_i),t^TX_it)$$" ] }, { "cell_type": "code", "execution_count": 12, "id": "893da39b", "metadata": {}, "outputs": [], "source": [ "total_space = GraphSpace(n_nodes=5)\n", "total_space.equip_with_group_action()\n", "total_space.equip_with_quotient();" ] }, { "cell_type": "code", "execution_count": 13, "id": "6f33d152", "metadata": {}, "outputs": [], "source": [ "s = gs.array([random.randint(0, 10) for i in range(10)])" ] }, { "cell_type": "code", "execution_count": 14, "id": "a4cb1e48", "metadata": {}, "outputs": [], "source": [ "aac_reg = AAC(space=total_space, estimate=\"regression\")" ] }, { "cell_type": "code", "execution_count": 15, "id": "0a2152ae", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING: Maximum number of iterations 20 reached. The estimate may be inaccurate\n" ] } ], "source": [ "aac_reg.fit(s, graphset_1);" ] }, { "cell_type": "markdown", "id": "46835c03", "metadata": {}, "source": [ "The coefficients are saved in the following attributes and they can be changed into a graph shape." ] }, { "cell_type": "code", "execution_count": 16, "id": "9b71203c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0. ],\n", " [ 0.02115813],\n", " [ 0.00890869],\n", " [-0.0233853 ],\n", " [-0.00668151],\n", " [-0.02561247],\n", " [-0. ],\n", " [ 0.04899777],\n", " [-0.04120267],\n", " [ 0.02004454],\n", " [ 0.0233853 ],\n", " [ 0.06458797],\n", " [-0. ],\n", " [-0.06904232],\n", " [-0.02115813],\n", " [-0.02115813],\n", " [ 0.07238307],\n", " [ 0.04008909],\n", " [-0. ],\n", " [ 0.07572383],\n", " [-0. ],\n", " [-0.04788419],\n", " [ 0.0701559 ],\n", " [ 0.00445434],\n", " [-0. ]])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aac_reg.total_space_estimator.coef_" ] }, { "cell_type": "markdown", "id": "874fefa2", "metadata": {}, "source": [ "A graph can be predicted using the fit model and the corresponding prediction error can be computed:" ] }, { "cell_type": "code", "execution_count": 17, "id": "b03476b8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "16.93063035236635" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "graph_pred = aac_reg.total_space_estimator.predict(s)\n", "\n", "gs.sum(graph_space.metric.dist(graphset_1, graph_pred))" ] } ], "metadata": { "backends": [ "numpy" ], "celltoolbar": "Tags", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 5 }