{"id":279,"date":"2021-08-02T02:02:57","date_gmt":"2021-08-02T02:02:57","guid":{"rendered":"http:\/\/summergeometry.org\/sgi2021\/?p=279"},"modified":"2021-08-02T02:02:57","modified_gmt":"2021-08-02T02:02:57","slug":"embedding-hierarchical-data-in-hyperbolic-geometry","status":"publish","type":"post","link":"https:\/\/summergeometry.org\/sgi2021\/embedding-hierarchical-data-in-hyperbolic-geometry\/","title":{"rendered":"Embedding Hierarchical Data in Hyperbolic Geometry"},"content":{"rendered":"\n<p class=\"has-text-align-center\"><em>By: Zeltzyn Montes and Sneha Sambandam<\/em><\/p>\n\n\n\n<p>Many of the datasets we would like to perform machine learning on have an intrinsic hierarchical structure. This includes social networks, taxonomies, NLP sentence structure, and anything that can be represented as a tree. While the most common machine learning tools work in the Euclidean space, this space is not optimal for representing hierarchical data.<\/p>\n\n\n\n<p>When representing hierarchical data in the 2D space, we have two main objectives: to preserve the hierarchical relationships between parent and child nodes; and to ensure distances between nodes are somewhat proportional to the number of links in between. However, when representing this structure in the 2D Euclidean space, we run into a few limitations.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/image-3-1.png\" alt=\"\" class=\"wp-image-290\" width=\"316\" height=\"286\" srcset=\"https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/image-3-1.png 639w, https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/image-3-1-300x272.png 300w\" sizes=\"auto, (max-width: 316px) 100vw, 316px\" \/><figcaption>Fig 1. A tree with branching factor of 2 drawn in the Euclidean space. (Courtesy of Alison Pouplin)<\/figcaption><\/figure><\/div>\n\n\n\n<p>For example, let&#8217;s first draw a tree, with a branching factor of 2 (<em>Fig 1<\/em>). When our tree is very deep, placing nodes equidistantly causes us to run out of space rather quickly, like in our example above at only a depth of 5. The second problem we run into is with untrue distance relationships among nodes. To help visualize this phenomenon, refer to <em>Fig 2a<\/em>. Here, arc refers to the shortest path between two nodes.<\/p>\n\n\n\n<figure class=\"wp-block-gallery aligncenter columns-2 is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\"><ul class=\"blocks-gallery-grid\"><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"638\" height=\"576\" src=\"http:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/image-2-1.png\" alt=\"\" data-id=\"287\" data-full-url=\"http:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/image-2-1.png\" data-link=\"http:\/\/summergeometry.org\/sgi2021\/?attachment_id=287\" class=\"wp-image-287\" srcset=\"https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/image-2-1.png 638w, https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/image-2-1-300x271.png 300w\" sizes=\"auto, (max-width: 638px) 100vw, 638px\" \/><\/figure><\/li><li class=\"blocks-gallery-item\"><figure><img loading=\"lazy\" decoding=\"async\" width=\"637\" height=\"575\" src=\"http:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/image-1.png\" alt=\"\" data-id=\"288\" data-full-url=\"http:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/image-1.png\" data-link=\"http:\/\/summergeometry.org\/sgi2021\/?attachment_id=288\" class=\"wp-image-288\" srcset=\"https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/image-1.png 637w, https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/image-1-300x271.png 300w\" sizes=\"auto, (max-width: 637px) 100vw, 637px\" \/><\/figure><\/li><\/ul><figcaption class=\"blocks-gallery-caption\">                                 <span class=\"has-inline-color has-background-color\">&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212; &#8211;<\/span>Fig 2a.<span class=\"has-inline-color has-background-color\">&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<span class=\"has-inline-color has-background-color\">&#8211;<\/span>&#8211;<span class=\"has-inline-color has-background-color\">&#8211;<\/span>&#8212;&#8212;&#8211;<\/span> <span class=\"has-inline-color has-background-color\">&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<span class=\"has-inline-color has-background-color\">&#8211;<\/span>&#8211;<span class=\"has-inline-color has-background-color\">&#8211;<\/span>&#8212;&#8211;<span class=\"has-inline-color has-background-color\">&#8211;<\/span>&#8211;<\/span> <span class=\"has-inline-color has-background-color\">&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<\/span>Fig 2b.<br>                        Euclidean vs desired measurement of distance between nodes.<br>                                                       (Courtesy of Alison Pouplin)<\/figcaption><\/figure>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\"><\/div>\n<\/div>\n\n\n\n<p>In theory, since the length of the arc between the purple nodes is less than that of the blue nodes, the purple nodes should result in the smaller distance. However, the Euclidean distance between the purple nodes is larger than the distance between the blue nodes. With the Euclidean 2D space, we observe that siblings in later generations end up closer and closer together, rendering Euclidean distance almost meaningless, as it only grows linearly with depth. Instead, we need a better distance measure: one that somehow travels \u201cup\u201d a tree before going back down as shown in <em>Fig 2b<\/em> and that is analogous to tree distance that grows exponentially as shown in <em>Fig 3<\/em>.&nbsp;<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/poincarefig3-1024x316.png\" alt=\"\" class=\"wp-image-284\" width=\"610\" height=\"188\" srcset=\"https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/poincarefig3-1024x316.png 1024w, https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/poincarefig3-300x93.png 300w, https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/poincarefig3-768x237.png 768w, https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/poincarefig3-1200x371.png 1200w, https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/poincarefig3.png 1463w\" sizes=\"auto, (max-width: 610px) 100vw, 610px\" \/><figcaption>Fig 3. Desired measurement of distances between nodes. (Courtesy of Alison Pouplin)<\/figcaption><\/figure><\/div>\n\n\n\n<p>A solution for this is to add an extra dimension (<em>Fig 4<\/em>). Now, the tree follows a smooth manifold: a hyperboloid. A hyperboloid is a manifold with a negative sectional curvature in the <a rel=\"noreferrer noopener\" href=\"https:\/\/en.wikipedia.org\/wiki\/Minkowski_space\" data-type=\"URL\" data-id=\"https:\/\/en.wikipedia.org\/wiki\/Minkowski_space\" target=\"_blank\">Minkowski space<\/a>, drawn by rotating a parabola around its symmetrical axis.&nbsp;Note, the Minkowski space is similar to the Euclidean space except dimension 1 is treated differently, while other dimensions are treated similarly to the Euclidean space. <\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"312\" src=\"http:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/poincarefig4-1024x312.png\" alt=\"\" class=\"wp-image-283\" srcset=\"https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/poincarefig4-1024x312.png 1024w, https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/poincarefig4-300x91.png 300w, https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/poincarefig4-768x234.png 768w, https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/poincarefig4-1536x468.png 1536w, https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/poincarefig4-1200x365.png 1200w, https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/poincarefig4.png 1895w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Fig 4. Adding a fourth dimension to hierarchical data. (Courtesy of Alison Pouplin)<\/figcaption><\/figure><\/div>\n\n\n\n<p>Embedding the tree on a smooth manifold now allows us to compute geodesics (shortest distance between two points on a surface). We can also project the data onto a lower dimensional manifold, such as the Poincar\u00e9 disk model. This model can be derived using a stereoscopic projection of the hyperboloid model onto the unit circle of the <em>z = 0<\/em> plane (<em>Fig 5<\/em>).<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/poincare-1.png\" alt=\"\" class=\"wp-image-282\" width=\"324\" height=\"334\" srcset=\"https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/poincare-1.png 563w, https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/poincare-1-291x300.png 291w\" sizes=\"auto, (max-width: 324px) 100vw, 324px\" \/><figcaption>Fig 5. Projecting hyperbolic geodesic to Poincar\u00e9 disk. (Courtesy of Alison Pouplin)<\/figcaption><\/figure><\/div>\n\n\n\n<p>The basic idea of this stereoscopic projection is:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><em>Start with a point<strong> P<\/strong> on the hyperboloid we wish to map.<\/em><\/li><li><em>Extend <strong>P<\/strong> out to a focal point N = (0,0,\u22121) to form a line.<\/em><\/li><li><em>Project that line onto the z = 0 plane to find our point <strong>Q<\/strong> in the Poincar\u00e9 model.<\/em><\/li><\/ol>\n\n\n\n<p>The Poincar\u00e9 disk model is governed by the M\u00f6bius gyrovector space in the same way that Euclidean geometry is governed by the common vector space (i.e., we have different rules, properties and metrics that govern this space). While we won\u2019t go into the specifics of what constitutes a gyrovector space, we shall detail the properties of the Poincar\u00e9 disk that make it advantageous for representing hierarchical data.&nbsp;<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full is-resized\"><a href=\"https:\/\/bjlkeng.github.io\/posts\/hyperbolic-geometry-and-poincare-embeddings\/\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/image-4.png\" alt=\"\" class=\"wp-image-291\" width=\"314\" height=\"300\" srcset=\"https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/image-4.png 501w, https:\/\/summergeometry.org\/sgi2021\/wp-content\/uploads\/2021\/07\/image-4-300x287.png 300w\" sizes=\"auto, (max-width: 314px) 100vw, 314px\" \/><\/a><figcaption>Fig 6. A tree with branching factor of 2 drawn on the Poincar\u00e9 disk. (<a href=\"https:\/\/arxiv.org\/pdf\/1705.08039.pdf\" data-type=\"URL\" data-id=\"https:\/\/arxiv.org\/pdf\/1705.08039.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Source<\/a>)<\/figcaption><\/figure><\/div>\n\n\n\n<p>In <em>Fig 6<\/em>, we can see a hierarchical tree with branching factor two embedding into a Poincar\u00e9 disk. Distances between all points are actually equal because distances grow exponentially as you move toward the edge of the disk.&nbsp;<\/p>\n\n\n\n<p>Some notable points of this model are the following:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>The arcs never reach the circumference of the circle. This is analogous to the geodesic on the hyperboloid extending out to infinity, that is, as the arc approaches the circumference it&#8217;s approaching the &#8220;infinity&#8221; of the plane.<\/li><li>This means distances at the edge of the circle grow exponentially as you move toward the edge of the circle (compared to their Euclidean distances).<\/li><li>Each arc approaches the circumference at a 90 degree angle, this just works out as a result of the math of the hyperboloid and the projection. The straight line in this case is a point that passes through the &#8220;bottom&#8221; of the hyperboloid at<em> (0,0,1)<\/em>.<\/li><\/ul>\n\n\n\n<p>With these properties we are able to represent data in a compact, visually pleasing and apt form using hyperbolic geometry. Current research explores how we can harness these properties in order to perform effective machine learning on hierarchical data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By: Zeltzyn Montes and Sneha Sambandam Many of the datasets we would like to perform machine learning on have an intrinsic hierarchical structure. This includes social networks, taxonomies, NLP sentence structure, and anything that can be represented as a tree. While the most common machine learning tools work in the Euclidean space, this space is [&hellip;]<\/p>\n","protected":false},"author":19,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9,25],"tags":[21,22,23,24],"class_list":["post-279","post","type-post","status-publish","format-standard","hentry","category-math","category-sgi-research-projects","tag-geometry","tag-sgi","tag-sneha-sambandam","tag-zeltzyn-montes"],"_links":{"self":[{"href":"https:\/\/summergeometry.org\/sgi2021\/wp-json\/wp\/v2\/posts\/279","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/summergeometry.org\/sgi2021\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/summergeometry.org\/sgi2021\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/summergeometry.org\/sgi2021\/wp-json\/wp\/v2\/users\/19"}],"replies":[{"embeddable":true,"href":"https:\/\/summergeometry.org\/sgi2021\/wp-json\/wp\/v2\/comments?post=279"}],"version-history":[{"count":10,"href":"https:\/\/summergeometry.org\/sgi2021\/wp-json\/wp\/v2\/posts\/279\/revisions"}],"predecessor-version":[{"id":314,"href":"https:\/\/summergeometry.org\/sgi2021\/wp-json\/wp\/v2\/posts\/279\/revisions\/314"}],"wp:attachment":[{"href":"https:\/\/summergeometry.org\/sgi2021\/wp-json\/wp\/v2\/media?parent=279"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/summergeometry.org\/sgi2021\/wp-json\/wp\/v2\/categories?post=279"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/summergeometry.org\/sgi2021\/wp-json\/wp\/v2\/tags?post=279"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}