Categories
Research

Hex Marks the Spot

How can we extend the 2-close interaction to a general n-close interaction?

That was the question at the end of the triangle’s hex blog post. This post goes through the thought process and tools used to solve for the sequence that extends the 2-close interaction to a general \(n\)-close interaction.

The approach here focused on analyzing the rotation of the dual structure and determining the precise amount of rotation required to arrive at the desired \(n\)-case configuration. This result was obtained by solving a relatively simple Euclidean geometry subproblem, which sometimes is encountered as a canonical geometry problem at the secondary or high-school level. Through this perspective, a closed-form starting step for the general sequence was successfully derived.

OPEN FOR SPOILER FOR ANSWER

The rotation angle, \(\theta\), required to obtain a part of the the \(n\)-close interaction term in the dual is given by:

$$
\theta = 60 \ – \ \sin^{-1} \left( \frac{3 n}{2 \sqrt{3 (n^2 + n + 1)}} \right)
$$

As with many mathematical problems, there are multiple approaches to reach the same result. I really encourage you to explore doing it yourself but without further ado, here is the presentation of the thought process that led to this solution.

By the way, the working name for this article was
Hex and the City: Finding Your N-Closest Neighbors ,
if you read all this way, I think you at least deserve to know this.


Constructing the Solution

We start off where we left off in the Auxiliary Insights section of the last blog post.

Recall that a rotation by \(60^\circ\) returns us to the original dual line due to the underlying sixfold symmetry. For the \(2\)-close interaction, we introduced a new dual by halving this angle to $30^\circ$. Naturally, to explore the \(3\)-close case, one might attempt the naive approach of halving again to $15^\circ$. However, as seen below, this approah does not yield the desired result, indicating that the simple halving approach cannot be extended straightforwardly.

By manually guessing and checking the rotation values, we end up with a somewhat unexpected sequence, something that, for me at least, is nothing immediately obvious.

This hints at the need to uncover a more subtle structure that lets us analytically determine the values we seek.

Some prerequisite background is provided in the next section, but if you’re already familiar, feel free to skip ahead to the solution process that builds on these ideas.


Some useful underlying mathematics prerequisites that could help

Modular Arithmetic

A Quick Introduction to Modular Arithmetic

Modular arithmetic is a system of arithmetic for integers where numbers “wrap around” after reaching a certain value called the modulus. Rather than focusing on the full quotient of a division, modular arithmetic concerns itself with the remainder. It is a foundational concept in number theory and has wide applications in cryptography, computer science, and abstract algebra.

Motivation

Suppose you are working with a standard 12-hour clock. If the current time is 9 and 5 hours pass, the clock shows 2, not 14. This is because clocks operate “modulo 12”, meaning we consider two numbers equivalent if they differ by a multiple of 12. So:

$$
9 + 5 = 14 \equiv 2 \pmod{12}
$$

This is a basic example of modular arithmetic: rather than computing the full sum, we reduce it to its residue modulo 12.

Definition

Let \(n \) be a fixed positive integer, called the modulus. For any integers \(a \) and \(b \), we say:

$$
a \equiv b \pmod{n}
$$

(read as “a is congruent to b modulo n”) if and only if \(n \) divides \(a – b \); that is, there exists an integer \(k \) such that:

$$
a – b = kn
$$

This defines an equivalence relation on the set of integers \(\mathbb{Z} \), which partitions \(\mathbb{Z} \) into \(n \) distinct congruence classes modulo \(n \). Each class contains all integers with the same remainder upon division by \(n \).

For example, with \(n = 5 \), the integers are grouped into the following classes:

$$
[0] = { \dots, -10, -5, 0, 5, 10, \dots }
$$

$$
[1] = { \dots, -9, -4, 1, 6, 11, \dots }
$$

$$
[2] = { \dots, -8, -3, 2, 7, 12, \dots }
$$

$$
[3] = { \dots, -7, -2, 3, 8, 13, \dots }
$$

$$
[4] = { \dots, -6, -1, 4, 9, 14, \dots }
$$

Each integer belongs to exactly one of these classes. For computational purposes, we usually represent each class by its smallest non-negative element, i.e., by the residue \(r \in {0, 1, \dots, n-1} \).

Arithmetic in \(\mathbb{Z}/n\mathbb{Z} \)

The set of congruence classes modulo \(n \), denoted \(\mathbb{Z}/n\mathbb{Z} \), forms a ring under addition and multiplication defined by:

  • \([a] + [b] := [a + b] \)
  • \([a] \cdot [b] := [ab] \)

These operations are well-defined: if \(a \equiv a’ \pmod{n} \) and \(b \equiv b’ \pmod{n} \), then \(a + b \equiv a’ + b’ \pmod{n} \) and \(ab \equiv a’b’ \pmod{n} \).

If \(n \) is prime, \(\mathbb{Z}/n\mathbb{Z} \) is a finite field, meaning every nonzero element has a multiplicative inverse.

Formal Summary

Modular arithmetic generalizes the idea of remainders after division. Formally, for a fixed modulus \(n \in \mathbb{Z}_{>0} \), we say:

$$
a \equiv b \pmod{n} \iff n \mid (a – b)
$$

This defines a congruence relation on \(\mathbb{Z} \), leading to a system where calculations “wrap around” modulo \(n \). It is a central structure in elementary number theory, forming the foundation for deeper topics such as finite fields, Diophantine equations, and cryptographic algorithms.

Law Of Sines

Derivation of the Law of Sines

The Law of Sines is a fundamental result in trigonometry that relates the sides of a triangle to the sines of their opposite angles. This elegant identity allows us to find unknown angles or sides in oblique triangles using simple trigonometric ratios.

Setup and Construction

Consider an arbitrary triangle \(\triangle ABC \) with:

  • \(a = BC \)
  • \(b = AC \)
  • \(c = AB \)

To derive the Law of Sines, we draw an altitude from vertex \(A \) perpendicular to side \(BC \), and denote this height as \(h \).

Now consider the two right triangles formed by this altitude:

  • In \(\triangle ABC \), using angle \(B \): $$
    \sin B = \frac{h}{a} \quad \Rightarrow \quad h = a \sin B
    $$
  • Using angle \(C \): $$
    \sin C = \frac{h}{b} \quad \Rightarrow \quad h = b \sin C
    $$

Since both expressions equal \(h \), we equate them:

$$
a \sin B = b \sin C
$$

Rewriting this gives:

$$
\frac{a}{\sin C} = \frac{b}{\sin B}
$$

We can repeat this construction by drawing altitudes from the other vertices to derive similar relationships:

$$
\frac{b}{\sin A} = \frac{c}{\sin B}, \quad \frac{c}{\sin A} = \frac{a}{\sin C}
$$

Combining all, we obtain the complete Law of Sines:

$$
\frac{a}{\sin A} = \frac{b}{\sin B} = \frac{c}{\sin C}
$$

Geometric Interpretation

The Law of Sines implies that in any triangle, the ratio of a side to the sine of its opposite angle is constant. In fact, this constant equals \(2R \), where \(R \) is the radius of the circumcircle of the triangle:

$$
\frac{a}{\sin A} = \frac{b}{\sin B} = \frac{c}{\sin C} = 2R
$$

This shows a deep geometric connection between triangles, circles, reflecting the underlying symmetry of some parts of Euclidean geometry.

Law Of Cosines (generalisation of Pythagorean theorem)

Derivation of the Cosine Angle Formula

The cosine angle formula, also known as the Law of Cosines, relates the lengths of the sides of a triangle to the cosine of one of its angles.

Motivation

Consider a triangle with sides of lengths \(a \), \(b \), and \(c \), and let \(\theta \) be the angle opposite side \(c \). When \(\theta \) is a right angle, the Pythagorean theorem applies:

$$
c^2 = a^2 + b^2
$$

For any angle \(\theta \), the Law of Cosines generalizes this relationship:

$$
c^2 = a^2 + b^2 – 2ab \cos \theta
$$

This formula allows us to find a missing side or angle in any triangle, not just right triangles.

Setup and Coordinate Placement

To derive the formula, place the triangle in the coordinate plane:

  • Place vertex \(A \) at the origin \((0, 0) \).
  • Place vertex \(B \) on the positive x-axis at \((b, 0) \).
  • Place vertex \(C \) at \((x, y) \), such that the angle at \(A \) is \(\theta \).

By definition, the length \(a \) is the distance from \(B \) to \(C \), and \(c \) is the length from \(A \) to \(C \).

Since \(\theta \) is the angle at \(A \), the coordinates of \(C \) can be expressed using polar coordinates relative to \(A \):

$$
C = (a \cos \theta, a \sin \theta)
$$

Distance Calculation

The length \(c \) is the distance between points \(A(0,0) \) and \(C(a \cos \theta, a \sin \theta) \):

$$
c = \sqrt{(a \cos \theta – 0)^2 + (a \sin \theta – 0)^2} = a
$$

Actually, to keep consistent with the notation, let’s redefine:

  • \(AB = c \)
  • \(AC = b \)
  • \(BC = a \)

So placing:

  • \(A = (0, 0) \)
  • \(B = (c, 0) \)
  • \(C = (x, y) \)

with angle \(\theta = \angle A \) at point \(A \).

Point \(C \) has coordinates:

$$
C = (b \cos \theta, b \sin \theta)
$$

The length \(a = BC = \) distance between \(B(c, 0) \) and \(C(b \cos \theta, b \sin \theta) \):

$$
a^2 = (b \cos \theta – c)^2 + (b \sin \theta – 0)^2
$$

Expanding:

$$
a^2 = (b \cos \theta – c)^2 + (b \sin \theta)^2
= (b \cos \theta)^2 – 2bc \cos \theta + c^2 + b^2 \sin^2 \theta
$$

Simplify using the fundamental trigonometric identity \(\cos^2 \theta + \sin^2 \theta = 1 \):

$$
a^2 = b^2 \cos^2 \theta – 2bc \cos \theta + c^2 + b^2 \sin^2 \theta = b^2(\cos^2 \theta + \sin^2 \theta) – 2bc \cos \theta + c^2
$$

$$
a^2 = b^2 – 2bc \cos \theta + c^2
$$

This is the Law of Cosines, it states that for any triangle with sides \(a, b, c \) and angle \(\theta \) opposite side \(a \):

$$
a^2 = b^2 + c^2 – 2bc \cos \theta
$$

This formula generalizes the Pythagorean theorem and allows computation of unknown sides or angles in any triangle, making it fundamental in trigonometry and geometry.

Height of a Regular Hexagon of Side Length, \(a\)

Find the height of a regular hexagon with side length \(a\)

Understand the structure of a regular hexagon

  • A regular hexagon can be divided into 6 equilateral triangles.
  • Each triangle has side length \(a\).
  • The height of the hexagon is the distance between two parallel sides, i.e., it’s twice the height of one equilateral triangle formed inside.

So:

$$
\text{height of hexagon} = 2 \times \text{height of equilateral triangle with side } a
$$

Use formula for height of an equilateral triangle

The height \(h\) of an equilateral triangle with side length \(a\) is:

$$
h = a \cdot \sin(60^\circ)
$$

We use this since the height splits the triangle into two right triangles with angles \(30^\circ, 60^\circ, 90^\circ\).

Substitute:

$$
\sin(60^\circ) = \frac{\sqrt{3}}{2}
$$

So:

$$
h = a \cdot \frac{\sqrt{3}}{2}
$$

Multiply by 2 to get full hexagon height

$$
\text{Height of hexagon} = 2 \cdot \left( a \cdot \frac{\sqrt{3}}{2} \right)
$$

Yielding Result:

$$
\text{Height of a regular hexagon with side length } a = a \sqrt{3}
$$

Limits at infinity

Limits at Infinity: A Conceptual Overview

In mathematical analysis, the notion of a limit at infinity provides a rigorous framework for understanding the behavior of functions as their input grows arbitrarily large in magnitude. Specifically, we are interested in the asymptotic behavior of a function \(f(x) \) as \(x \to \infty \) (or \(x \to -\infty \)), that is, how \(f(x) \) behaves for sufficiently large values of \(x \).

Formally, we say that
$$
\lim_{x \to \infty} f(x) = L
$$
if for every \(\varepsilon > 0 \), there exists a real number \(N \) such that
$$
x > N \Rightarrow |f(x) – L| < \varepsilon.
$$
This definition captures the idea that the values of \(f(x) \) can be made arbitrarily close to \(L \) by taking \(x \) sufficiently large.

Limits at infinity play a crucial role in the study of convergence, asymptotics, and the global behavior of functions.


Continuing to derive the solution

Simplifying the focus

We begin by exploiting the inherent sixfold rotational symmetry, we reduce the problem “modulo 6”-style, effectively restricting our attention to a single fundamental sector (i.e one \(60\) degree sector). By doing so we simplify the analysis to just one representative section within this equivalence class.

If we restrict our view to along the direction of the blue arrow, we observe that the rotation lines align to form a straight “line” sequence. This linear arrangement reveals an underlying order that we can potentially use.

To proceed, we now seek an invariant, a quantity or structure that remains stable under the transformations, under consideration. The natural approach here is to introduce an auxiliary red arrow, aligned with the reference direction. By connecting this to each rotation line, we form triangles with a common “base”.

Let us now regard each line as encoding an angular offset, specifically, a rotation relative to a fixed reference direction given by the dark blue line (dual). , We can think of the red arrow as some sort of basis vector corresponding to the blue line . Our aim is to compute the rotational offset of each line with respect to this reference frame, treating the red arrow as a canonical representative of the direction encoded by the dark blue line.

Finding the general formula for the offset angle between the red arrow (a representation of the original dual) AND the \(n\)-case triangle

Dotted lines here are the lines we saw in the image earlier

Given:

  • Triangle with two sides:
  • Side 1: \(a \sqrt{3}\)
  • Side 2: \(a n \sqrt{3}\)
  • Included angle between them: \(120^\circ\)
  • Goal: Find the angle \(A\), opposite to the side \(a n \sqrt{3}\)

Step 1: Use Law of Cosines to find the third side

$$
c^2 = \alpha^2 + \beta^2 – 2 \alpha \beta \cos(C)
$$

Assign:

  • \( \alpha = a \sqrt{3}\)
  • \(\beta = a n \sqrt{3}\)
  • \(C = 120^\circ\)

$$
c^2 = (a \sqrt{3})^2 + (a n \sqrt{3})^2 – 2 (a \sqrt{3})(a n \sqrt{3}) \cos(120^\circ)
$$

Evaluate:

  • \((a \sqrt{3})^2 = 3 a^2\)
  • \((a n \sqrt{3})^2 = 3 a^2 n^2\)
  • \(2ab = 6 a^2 n\)
  • \(\cos(120^\circ) = -\frac{1}{2}\)

$$
c^2 = 3 a^2 + 3 a^2 n^2 + 3 a^2 n
$$

$$
c^2 = 3 a^2 (n^2 + n + 1)
$$

$$
c = a \sqrt{3 (n^2 + n + 1)}
$$

Step 2: Use Law of Sines to find the angle opposite to \(a n \sqrt{3}\)

$$
\frac{\sin A}{a n \sqrt{3}} = \frac{\sin(120^\circ)}{c}
$$

Substitute:

  • \(\sin(120^\circ) = \frac{\sqrt{3}}{2}\)
  • \(c = a \sqrt{3 (n^2 + n + 1)}\)

$$
\sin A = \frac{a n \sqrt{3}}{a \sqrt{3 (n^2 + n + 1)}} \times \frac{\sqrt{3}}{2}
$$

Cancel \(a\):

$$
\sin A = \frac{n \sqrt{3}}{\sqrt{3 (n^2 + n + 1)}} \times \frac{\sqrt{3}}{2}
$$

$$
\sin A = \frac{3 n}{2 \sqrt{3 (n^2 + n + 1)}}
$$

Final Result for the ? angle:

$$
A = \sin^{-1} \left( \frac{3 n}{2 \sqrt{3 (n^2 + n + 1)}} \right)
$$

Adjusting the Sequence for clearer understanding of the Asymptotics

We can now utilize the function defined above to generate the desired terms in the sequence. This brings us close to our objective. Upon closer inspection, we observe that the sequence aligns particularly well when interpreted as an angular offset. Specifically, as illustrated in the image below, we begin with the blue line corresponding to the case \(n = 0\), and for each subsequent value of \(n\), we proceed in a clockwise direction. (These lines correspond to the dotted rays shown in the previous image.)


Drawing a link between the symmetry and this

Originally, by leveraging the underlying symmetry of the problem, we simplified the analysis by focusing in a single direction, meaning in other words, we can intuitively infer that the sequence “resets” every 60 degrees.

Now instead, we can analyze the limiting behavior of the relevant angle in the \(n\)-th triangle considered earlier. As \(n\) grows large, the angle opposite the red line (the \(a \sqrt{3}\) side), which we denote by \(\omega\), decreases towards zero. Formally, this can be expressed as

$$
\lim_{n \to \infty} \omega = 0.
$$

Using the triangle angle sum formula, the angle of interest (the ?, but lets call it \(\phi\)), satisfies

$$
\phi = 180^\circ – 120^\circ – \omega.
$$

Therefore,

$$
\lim_{n \to \infty} \phi = 60^\circ,
$$

showing that the sequence converges to \(60^\circ\).

Final Adjustment of the formula to reflect the rotation of the original dual and the intuition above

Thus, from both a mathematical (okay maybe more so an aesthetic) standpoint, it is preferable to redefine the sequence so that it converges to zero. This aligns with the conceptual goal of determining ‘how much to rotate’ the original dual structure to obtain the \(n\)-th dual. In its current form, the sequence converges to a fixed offset (specifically, \(60^\circ\)), which, at first glance, is not immediately apparent from the formula.

To clarify what is meant by ‘converging to zero’: when we rotate the original dual by any multiple of \(60^\circ\) (i.e., an element of \(60\mathbb{Z}\)), we return to the original dual; similarly, a rotation by \(30\mathbb{Z}\) yields the 2-close interaction. In this context, we seek a rotation angle \(\theta\) such that \(\theta \cdot \mathbb{Z}\) captures the \(n\)-close interaction, and thus, it is natural to define \(\theta\) as a function of \(n\) that tends to zero as \(n \to \infty\). This reframing reveals the underlying structure more clearly and places the emphasis on the relative rotation required to reach the \(n\)-th configuration from the base case.

This sequence tending to zero carries a natural interpretation: it suggests diminishing deviation or adjustment over time. This aligns well with the intuition that as \(n\) increases, the system or pattern is “settling down” or tending to the reset point. In contrast, convergence to a nonzero constant may obscure this interpretation (unless you know a-priory about the symmetries of this system) , since the presence of a persistent offset implies a kind of residual asymmetry or imbalance. By ensuring that the sequence vanishes asymptotically, we emphasize convergence toward a limiting behavior centered at zero.

This was the justification to introducing a subtraction of 60 in front of the A earlier to consider the sequence in the original notion of rotation of the dual instead. Yielding us

$$
\theta = 60 \ – \ A
$$

$$
\theta = 60 \ – \ \sin^{-1} \left( \frac{3 n}{2 \sqrt{3 (n^2 + n + 1)}} \right).
$$

After this adjustment, hopefully it becomes apparent, even from a quick inspection of the formula, that the constant \(60\) plays a fundamental role in limiting structure of the problem.

Thus we have

The Problem is not finished

Now, reintroducing the 6-fold symmetry and examining the superposition of plots corresponding to the 1-close through \(n\)-close interactions (for increasingly large values of \(n\)), we observe:

But wait … this does not cover the full grid. It’s clearly missing parts! You scammed me! Well yes but not quite as I did mention at the start

“…closed-form starting step

Srivathsan Amruth,
You really didn’t think I was going to give you the full answer right…

I intentionally stopped here, as I want you to explore the problem further and realize that there are, in fact, additional symmetries you can exploit to fully cover the hexagonal grid. I’ll leave you with a the following hint to start:

Consider the 1-ray case and its associated \(60^\circ\) cone.

Now look at the 3-close case (green) and the 2-close case (purple). Ask yourself: can I flip something to fill in the missing part of the 3-close structure?

Then, move on to the 4-close case. Since (spoiler) there will be two distinct 3-close components involved, can I once again use flipping to complete the picture? And if so, how many times, and where, should these flips occur?

It’s getting quite recursive from here… but I’ll let you figure out the rest.

Also, I’ll leave you with one final thought, i.e is there a way to avoid this recursive construction altogether or simplify the approach, and instead generate the full structure more efficiently using some iterative or closed-form approach?

Wishing you a hex-tra special day ahead with every piece falling neatly into place.

Srivathsan Amruth,
i.e The person guiding offering you through this symmetry-induced spiral.
Categories
Research

The Triangle’s Hex

An Open Challenge

The goal of this post is to celebrate the question. History’s greatest intellectual leaps often began not with the answer itself, but with the journey of tackling a problem. Hilbert’s 23 problems charted the course of a good chunk of 20th-century mathematics. Erdős scattered challenges that lead to breakthroughs in entire fields. The Millennium Prize Problems continue to inspire and frustrate problem solvers.

There’s magic in a well-posed problem; it ignites curiosity and forces innovation. Since I am nowhere near the level of Erdős or Hilbert, today, we embrace that spirit with a maybe less well-posed but deceptively simple yet intriguing geometric puzzle on a disordered surface.

To this end, we present a somewhat geometric-combinatorial problem that bridges aspects of computational geometry, bond percolation theory and graph theory.

Problem Construction

We begin with the hexagonal lattice in \(\mathbb{R}^2\) (the two-dimensional plane), specifically the regular hexagonal lattice (also known as the honeycomb lattice). For visualisation, a finite subregion of this lattice tessellation of the plane is illustrated below.

We then enrich the information of our space by constructing centroids associated with each hexagon in the lattice. The guiding idea is to represent each face of the lattice by its centroid, treating these points as abstract proxies for the faces themselves.

These centroids are connected precisely when their corresponding hexagons share a common edge in the original lattice. Due to the hexagonal tiling’s geometry, this adjacency pattern induces a regular network of connections between centroids, reflecting the staggered arrangement of the hexagons.

This process yields a new structure: the dual lattice. In this case, the resulting dual is the triangular lattice.This dual lattice encodes the connectivity of the original tiling’s faces through its vertices and edges.

As a brief preview of the questions to come, our goal is to identify triangles within this expansive lattice. We begin by requiring that the vertices of these triangles originate from the centroids of the hexagonal faces. To then encode such triangular structures, we leverage both the primal and dual lattices, combining them so that the edges of these triangles can exist on any of the black paths shown in the diagram.

What is particularly interesting here is that the current construction arises from 1-neighbor connections, which we interpret as close-range interactions. However, this 1-neighbor primal-dual framework restricts the set of possible triangles to only equilateral ones. (Personally, I feel this limits the complexity of potential problems.)

On this grid, every triangle’s got equal sides, I guess this side of math is tired of drama.Trying to spot a scalene here is like trying to find a unicorn in a math textbook.

To make the question more engaging and challenging, we seek to incorporate more “medium-to-further-range interactions“. This can be accomplished by connecting every second neighbor in an analogous fashion, thereby constructing a “triple space“. The resulting structure forms another dual lattice that captures these extended adjacency relations, allowing for a richer variety of triangular configurations beyond the equilateral case. For generality, I will coin the term for this as 2-close-range-interactions.

A lazy yet elegant approach to achieve this is by rotating the original dual lattice by \(90\) degrees, or, more generally, by any multiple of \(30\) degrees (i.e., \(30 \times n\) degrees , we will use this generality in the final section to extend the problem scope even further). We then combine this rotated lattice with the original construction.

We will illustrate the process below.

The 2-close-range interaction:

We begin, as before, with the original primal and dual lattices.

We now rotate the blue dual lattice by \(90\) degrees, as described previously.

Combining the rotated dual with the original, as before, yields the following structure and, as the keen eyed among you may notice, both the dual constructions covers the entire original primal hexagonal lattice.

This finally yields us the following existence space for the triangles.

And now, we witness the emergence of a richer and more interesting variety of triangle configurations.

Wow, so many triangles to choose from here! Great, just what I wanted more problems(to select from)… wait, hold up…

The Random Interface

To introduce an interesting stochastic element to our problem, we draw inspiration from percolation theory (a framework in statistical physics that examines how randomness influences the formation of connected clusters within a network). Specifically, we focus on the concept of disorder, exploring how the presence of randomness or irregularities in a system affects the formation of connected clusters. In our context, this translates to the formation of triangles within the lattice.

Building upon the original hexagonal lattice, we assign to each edge a probability \(p\) of being “open” or “existent”, and consequently, a probability \(1 – p\) of being “closed” or “nonexistent”. This approach mirrors the bond percolation model, where edges are randomly assigned to be open or closed, independent of other edges. The open edges allow for the traversal of triangle edges, akin to movement through a porous medium, facilitating the formation of triangles only through the connected open paths.

We illustrate using the figure a sample configuration when \(p = 0.5\), where the red “exit” edges represent “open” edges that allow triangles to traverse through the medium.

Rules regarding traversal of this random interface:

As mentioned earlier, every red “open” gate permits traversal. To define this traversal more clearly in our problem setting, we can visualise using the following.

In the 2-close-range interaction case, this traversal is either in the tangential direction

Apologies for the somewhat messy hand-drawn centroids and wonky arrows…

or in the normal direction.

Once again ignore the horribly hand-drawn centroids please… Also note that in both cases, the arrows (specifically the arrowheads) are included purely for illustrative purposes. The traversal can be considered bidirectional, depending on the context; that is, each triangle may be treated either as an undirected or a directed graph, depending on the construction method being employed by the problem solver (i.e you 😉 ).
An aside on Interaction Modeling

This model assumes that long-range interactions emerge from specified short-range interactions. We can see this in our construction of the 2-close explicitly excluding any secondary dual lattice edges (2-close interactions) that can be generated from first-neighbor edges (1-close interactions). This ensures that the secondary dual lattice contains only truly second-neighbor interactions not implied by (or that goes through) the first-neighbor structure, preserving a clear hierarchical interaction scheme.

The Problems

An “Easy” starting question :

Given some finite subsection of this random interface, construct an algorithm to find the largest triangle (in terms of perimeter or area, your choice of definition for the problem depending on which variant of the problem you want to solve) .

Attempt at formalising this
1) Definition of Lattices
Hexagon Lattice

Let the hexagonal lattice be denoted by:
$$
H = (C, E_h)
$$


where:

  • \(C = { c_i } \subset \mathbb{R}^2\) is the set of convex hexagonal cells that partition the Euclidean plane \(\mathbb{R}^2\),
  • \(E_h = { (c_i, c_j) \in C \times C \mid c_i \text{ and } c_j \text{ share a common edge} }\) is the adjacency relation between hexagons.

Each hexagon edge \(e_h \in E_h\) is activated independently with probability \(p \in [0,1]\):
$$
\Pr(\sigma_h(e_h) = 1) = p, \quad \Pr(\sigma_h(e_h) = 0) = 1-p
$$

Primary Dual Lattice \(T_1 = (V, E_t^{(1)})\)

The primary dual lattice connects centroids of adjacent hexagons:
$$
E_t^{(1)} = \left \{ (v_i, v_j) \in V \times V \,\middle|\, (c_i, c_j) \in E_h \right \}
$$

Secondary Dual Lattice \(T_2 = (V, E_t^{(2)})\)

The secondary dual lattice captures 2-close interactions. An edge \((v_i, v_j) \in E_t^{(2)}\) exists if and only if:

  1. \(c_i \neq c_j\) and \((v_i, v_j) \notin E_t^{(1)}\) (i.e., not first neighbors),
  2. There exists a common neighboring cell \(c_k \in C\) such that:
    $$
    (c_i, c_k) \in E_h \quad \text{and} \quad (c_j, c_k) \in E_h
    $$
  3. The straight-line segment \(\overline{v_i v_j}\) does not lie entirely within \(c_i \cup c_j \cup c_k\):
    $$
    \overline{v_i v_j} \not\subseteq c_i \cup c_j \cup c_k
    $$

Hence, the formal definition:
$$
E_t^{(2)} = \left\{ (v_i, v_j) \in V \times V \ \middle| \
\begin{aligned}
& c_i \ne c_j,\ (v_i, v_j) \notin E_t^{(1)}, \
& \exists\, c_k \in C:\ (c_i, c_k), (c_j, c_k) \in E_h, \
& \overline{v_i v_j} \not\subseteq c_i \cup c_j \cup c_k
\end{aligned}
\right\}
$$

Combined Dual Lattice

The complete dual lattice is defined as:
$$
T = (V, E_t), \quad \text{where} \quad E_t = E_t^{(1)} \cup E_t^{(2)}
$$

2) Edge Activation
Hexagon Edge Activation

Each hexagon edge \(e_h \in E_h\) is independently activated with probability \(p\) as:
$$
\sigma_h(e_h) \sim \text{Bernoulli}(p)
$$

Dual Edge Activation

Each dual edge \(e = (v_i, v_j) \in E_t\) has an activation state \(\sigma(e) \in \{0,1\}\) defined by the following condition:

$$\sigma(e) = 1 \quad \text{if } \overline{v_i v_j} = or \perp e_h \in E_h \text{ and } \sigma_h(e_h) = 1;
\
\sigma(e) = 0 \quad \text{otherwise}$$

In other words, a dual edge is activated only if it corresponds spatially to a hexagon edge that is itself activated, either by lying exactly on it or intersecting it at right angles.

3) Interaction Constraints

Local Traversal: Paths across the lattice are constrained to activated edges:
$$
P(v_a \rightarrow v_b) = { e_1, e_2, \dots, e_k \mid \sigma(e_i) = 1,\ e_i \in E_t }
$$

Cardinality:
$$
\deg_{T_1}(v_i) \leq 6, \quad \deg_{T_2}(v_i) \leq 6
$$
Each vertex connects to at most 6 first neighbors (in \(T_1\)) and at most 6 geometrically valid second neighbors (in \(T_2\)).

4) Triangle Definition

A valid triangle \(\Delta_{abc} = (v_a, v_b, v_c)\) must satisfy:

  1. Connectivity: All three edges are activated:
    $$
    \sigma(v_a v_b) = \sigma(v_b v_c) = \sigma(v_c v_a) = 1
    $$
  2. Embedding: The vertices \((v_a, v_b, v_c)\) form a non-degenerate triangle in \(\mathbb{R}^2\), preserving local lattice geometry.
  3. Perimeter: Defined as the total edge length:
    $$
    P(\Delta_{abc}) = |v_a – v_b| + |v_b – v_c| + |v_c – v_a|
    $$
  4. Area: Using the standard formula for triangle area from vertices:
    $$
    A(\Delta_{abc}) = \frac{1}{2} \left| (v_b – v_a) \times (v_c – v_a) \right|
    $$
    where \(\times\) denotes the 2D scalar cross product (i.e., determinant).

Triangles may include edges from either \(E_t^{(1)}\) or \(E_t^{(2)}\) as long as they meet the activation condition.

5) Optimization Objective

Given the dual lattice \(T = (V, E_t)\) and activation function \(\sigma: E_t \rightarrow \{0,1\}\), the goal is to find the valid triangle maximizing either:

  • Perimeter:
    $$
    \underset{\Delta_{abc}}{\mathrm{argmax}} \; P(\Delta_{abc})
    $$
  • Area:
    $$
    \underset{\Delta_{abc}}{\mathrm{argmax}} \; A(\Delta_{abc})
    $$

subject to \(\Delta_{abc}\) being a valid triangle.

Example of a triangle we can find in this sample of random interface.

Harder Open Questions:

1) What are the scaling limit behaviors of the triangle structures that emerge as the lattice size grows?

In other words: As we consider larger and larger domains, does the global geometry of triangle configurations converge to some limiting structure or distribution?

Ideas to explore:

Do typical triangle shapes concentrate around certain forms?

Are there limiting ratios of triangle types (e.g., equilateral vs. non-equilateral)?

Does a macroscopic geometric object emerge (in the spirit of scaling limits in percolation, e.g., SLE curves on critical lattices)?

An attempt to formalise this would be as follows :

Let’s denote by \(T_p(L)\) the set of all triangles that can be formed inside a finite region of side length \(L\) on our probabilistic lattice, where each edge exists independently with probability \(p\).

Conjecture: As \(L\) becomes very large (tends to infinity), the distribution of triangle types (e.g. equilateral, isosceles, scalene, acute, obtuse, right-angled) converges to a well-defined limiting distribution that depends only on \(p\). (i.e converges in law to some limiting distribution \(\mu_p\). )

Interpretation of expected behaviours:

  • For high \(p\) (near 1), most triangles are close to equilateral.
  • For low \(p\) (near 0), triangles become rare or highly irregular.
  • This limiting distribution, \(\mu_p\) , tells us the “global shape profile” of triangles formed under randomness.
2) How does the distribution of triangle types (e.g., equilateral, isosceles, scalene; or acute, right, obtuse) vary with the probability parameter \(p\)?

Why this matters: At low \(p\), triangle formation is rare, and any triangle that does form may be highly irregular. As \(p\) increases, more symmetric (or complete) structures may become common.

Possible observations:

Are equilateral triangles more likely to survive at higher \(p\) due to more consistent connectivity?

Do scalene or right-angled triangles dominate at intermediate values of \(p\) due to partial disorder?

Is there a characteristic “signature” of triangle types at criticality?

3) Can we define an interesting phase transition in this setting, marking a shift in the global geometry of the triangle space?

In other words: As we gradually increase the edge probability $p$, at what point does the typical shape of the triangles we find change sharply?

An attempt to formalise this would be as follows :

There exist two critical probability values \(p_1\) and \(p_2\) (where \(0 < p_1 < p_2 < 1\)) that mark qualitative transitions in the kinds of triangles we typically observe.

Conjecture:

  • When \(p < p_1\), most triangles that do form are highly irregular (i.e scalene).
  • When \(p_1 < p < p_2\), the system exhibits a diverse mix of triangle types (now we have more equilateral, isosceles with less domination of scalene; and variations of acute, right-angled, and obtuse).
  • When \(p > p_2\), the triangles become increasingly regular, and equilateral triangles dominate.

This suggests a “geometric phase transition” in the distribution of triangle shapes, driven purely by the underlying connectivity probability.

4) Is there a percolation threshold at which large-scale triangle connectivity suddenly emerges?

Let’s define a triangle cluster as a set of triangles that are connected via shared edges or vertices (depending on your construction). We’re interested in whether such triangle clusters can become macroscopically large.

Conjecture:
There exists a critical probability \(p_c\) such that:

  • If \(p < p_c\), then all triangle clusters are finite (they don’t span the system).
  • If \(p > p_c\), then with high probability (as the system size grows), a giant triangle cluster emerges that spans the entire domain.

This is analogous to the standard percolation transition, but now with triangles instead of edge or site clusters.

(The first attempt setting up this was whether a single giant triangle can engulf the whole space but in retrospect, this would not be possible and is not such an interesting question.)

5) Results on Triangle Density

As we grow the system larger and larger, does the number of triangles per unit area settles down to a predictable average, but this average depends heavily on the edge probability $p$.

Let \(N_p(L)\) denote the total number of triangles formed within a square region of side length \(L\).

We define the triangle density as

$$
\rho_p(L) := \frac{N_p(L)}{L^2},
$$

which is the average number of triangles per unit area.

Conjecture: As \(L \to \infty\), the triangle density converges to a deterministic function \(\rho(p)\):
$$
\lim_{L \to \infty} \rho_p(L) = \rho(p).
$$
(Here, \(\rho(p)\) captures the asymptotic average density of triangles in the infinite lattice for edge probability \(p\).)

Furthermore, the variance of \(N_p(L)\) satisfies
$$
\mathrm{Var}[N_p(L)] = o(L^4),
$$
(i.e variance of the number of triangles grows more slowly than the square of the area)

which implies the relative fluctuations:
$$
\frac{\sqrt{\mathrm{Var}[N_p(L)]}}{L^2} \to 0.
$$
vanish as \(L \to \infty\).

The function \(\rho(p)\) is analogous to the percolation density in classical bond percolation, which measures the expected density of occupied edges or clusters. Just as percolation theory studies the emergence and scaling of connected clusters, here \(\rho(p)\) characterizes the density of geometric structures (triangles) emerging from random edge configurations.

The sublinear variance scaling reflects a law of large numbers effect in spatial random graphs: as the region grows, the relative randomness in triangle counts smooths out, enabling well-defined macroscopic statistics.

This parallels known results in random geometric graphs and continuum percolation, where densities of subgraph motifs stabilize in large limits. This behavior may also hint at connections to scaling exponents or universality classes from statistical physics. (With certain constraints, could classify the triangles or regions of connected triangles as interfaces and make connections with KPZ type problems too, possibly another conjecture but I think 5 problems to think about are more than enough for now.)


What are some applications?

While I approach this problem primarily from a theoretical lens, with an interest in its mathematical structure and the questions it naturally raises, I also recognize that others may find motivation in more tangible or applied outcomes. To speak to that perspective, I include two detailed examples that illustrate how ideas from this setting can extend into meaningful real-world contexts. The aim is not to restrict the problem’s scope to practical ends, but to show how the underlying tools and insights may resonate across domains and disciplines.

—–//—–

Application 1

1) Retopology of a 3D Scan

Consider the setting of high-resolution 3D scanning, where millions of sample points capture the geometry of a complex surface. These points are typically distributed (or essentially clumped) within an \(\varepsilon\)-thick shell around the true surface geometry, often due to the scan exhibiting local noise, redundancy, or sampling irregularities. Within this context, our framework finds a natural interpretation.

One can view the scan as a point cloud embedded on a virtual, warped hexagonal lattice, where the underlying lattice geometry adapts to local sampling density. By introducing a probability field, governed by factors such as local curvature, scan confidence, or spatial regularity, we assign likelihoods to potential connections between neighboring points.

In this formulation, the task of finding large, connected configurations of triangles corresponds to the process of identifying coherent, semantically meaningful patches of the surface. These patches serve as candidates for retopologization (the construction of a low-resolution, structured mesh that preserves the essential geometric features of the scan while filtering out noise and redundancies). This resulting mesh is significantly more amenable to downstream applications such as animation, simulation, or interactive editing.

The nearest-neighbor locality constraint we impose reflects a real physical prior, i.e mesh edges should not “jump” across unconnected regions of the surface. This constraint promotes local continuity and geometric surface fidelity.

Application 2

2) Fault Line Prediction in Material Fracture

In materials science, understanding how fractures initiate and propagate through heterogeneous media is a central problem. At microscopic scales, materials often have grain boundaries, impurities, or irregular internal structures that influence where and how cracks form. This environment can be abstracted as a disordered graph or lattice, where connections (or lack thereof) between elements model structural integrity or weakness.

We can reinterpret such a material as a random geometric graph laid over a deformed hexagonal lattice, with each connection representing a potential atomic or molecular bond. Assign a probability field to each edge, governed by local stress, strain, or material heterogeneity, capturing the likelihood that a bond remains intact under tension.

Triangles in this setting can be thought of as minimal stable configurations. As stress increases or defects accumulate, connected triangle structures begin to break down. By tracking the largest connected triangle regions, one may identify zones of high local integrity, while their complement reveals emergent fault lines, regions where fracture is likely to nucleate.

Thus, searching for maximal triangle clusters under stochastic constraints becomes a method for predicting failure zones before actual crack propagation occurs. Moreover, one can study how the geometry and size distribution of these triangle clusters vary with external stress or material disorder, giving rise to scaling laws or percolation-like transitions.

This formulation connects naturally with known models of fracture networks, but adds a new geometric lens: triangulated regions become proxies for mechanical stability, and their disintegration offers a pathway to understand crack emergence.

—–//—–

Again these are just two specific lenses. At its core, the problem, finding large-scale, connected substructures under local stochastic constraints, arises across many domains. It is easy to reformulate and adjust the constraints to use the tools developed here in other problems.


Auxiliary Insights

When constructing triangles, one can consider the set of paths accessible from a single vertex. To aid visualization, refer to the diagram below.

The blue paths are from the 1-close connections, while the pink paths correspond to 2-close connections. Together, they provide each vertex with a total of \(6 + 6 = 12\) possible directions or “degrees of freedom” along which it may propagate to form triangle edges.

Notably, the system exhibits a high degree of rotational symmetry, both the 1-close and 2-close path structures are invariant under rotations by \(60^\circ\). This symmetry can potentially be leveraged to design efficient search algorithms or classification schemes for triangle types within the lattice.

Extension to the n-close case

So far, we’ve been using triangles as a proxy for a special class of simple closed-loop interaction pathways (i.e structures formed by chaining together local movements across the lattice). Specifically we restricted ourselves to the 1- and 2-close interaction setting.

Naturally, a question arises: can we generalize this further? Specifically, what sort of angular adjustment or geometric rule would be needed to define an n-close interaction, beyond the 1- and 2-close cases we’ve explored? (When looking above at the auxillery and the construction of the 2-close, we can see that we use the \(30\) degree rotation offset at the basis.)

This leads to a broader and richer family of questions. In my other blog post (will be linked soon after it is posted) , I’ll pose precisely this: “How can we extend the 2-close interaction to a general n-close interaction?” . There, I’ll also provide a closed-form formula for generating any term in the sequence of such $n$-close interactions.

As always, I encourage you to attempt the question yourself before reading the solution, it may seem straightforward at first, but the correct sequence or geometric structure is not always immediately obvious.

Final Thoughts

Crafting this problem, presenting some drawings and formalising certain aspects was an interesting challenge for me. As always, I welcome any feedback on how to improve the presentation or to formalise certain sections more rigorously if you have any ideas. I’m especially looking forward to seeing and reading about any progress made on the questions posed here, attempts at their variations, or even presentation of potential applications.

Cheers and have an hex-traordinary day ahead!

Srivathsan Amruth
i.e The writer of this masterpiece chaotic word salad, who apologizes for your lost minutes
Hey, why are you still reading? … Shouldn’t you be pretending to work? ; – )
Categories
Research

Topology Control: Pathfinding for Genus Preservation (2/2)

SGI Fellows:
Stephanie Atherton, Marina Levay, Ualibyek Nurgulan, Shree Singhi, Erendiro Pedro

Recall (from post 1/2) that while DeepSDFs are a powerful shape representation, they are not inherently topology-preserving. By “preserving topology,” we mean maintaining a specific topological invariant throughout shape deformation. In low-dimensional latent spaces \(z_{\text{dim}} = 2\), linear interpolation between two topologically similar shapes can easily stray into regions with a different genus, breaking that invariant.

Visualizing genera count in 2D latent space

Higher-dimensional latent spaces may offer more flexibility—potentially containing topology-preserving paths—but these are rarely straight lines. This points to a broader goal: designing interpolation strategies that remain within the manifold of valid shapes. Instead of naive linear paths, we focus on discovering homotopic trajectories—routes that stay on the learned shape manifold while preserving both geometry and topology. In this view, interpolation becomes a trajectory-planning problem: navigating from one latent point to another without crossing into regions of invalid or unstable topology.

Pathfinding in Latent Space

Suppose you are a hiker hoping to a reach a point \(A\) from a point \(B\) amidst a chaotic mountain range. Now your goal is to plan your journey so that there will be minimal height change in your trajectory, i.e., you are a hiker that hates going up and down much! Fortunately, an oracle gives us a magical device, let us call it \(f\), that can give us the exact height of any point we choose. In other words, we can query \(f(x)\) for any position \(x\), and this device is differentiable, \(f'(x)\) exists!

Metaphors aside, the problem of planning a path from a particular latent vector \(A\) to another \(B\) in the learned latent space would greatly benefit from another auxiliary network that learns the mapping from the latent vectors to the desired topological or geometric features. We will introduce a few ways to use this magical device – a simple neural network.

Gradient as the Compass

Now the best case scenario would be to stay at the same height for the whole of the journey, no? This greedy approach puts a hard constraint on the problem, but it also greatly reduces the possible space of paths to take. For our case, we would love to move toward the direction that does not change our height, and this set of directions precisely forms the nullspace of the gradient.

No height change in mathematical terms means that we look for directions where the derivatives equal zero, as in

$$D_vf(x) = \nabla f(x)\cdot v = 0,$$


where the first equality is a fact of the calculus and the second shows that any desirable direction \(v\in \text{null}(\nabla f(x))\).

This does not give any definitive directions for a position \(x\) but a set of possible good directions. Then a greedy approach is to take the direction that aligns most with our relative position against the goal – the point \(B\).

Almost done, but the problem is that if we always take steps with the assumption that we are on the correct isopath, we would end up accumulating errors. So, we actually want to nudge ourselves a tiny bit along the gradient to land back on the isopath. Toward this, let \(x\) be the current position and \(\Delta x=\alpha \nabla f(x) + n\), where \(n\) is the projection of \(B-x\) to \(\text{null}(\nabla f(x))\). Then we want \(f(x+\Delta x) = f(B)\). Approximate

$$f(B) = f(x+\Delta x) \approx f(x) + \nabla f(x)\cdot \Delta x,$$

and substituting for \(\Delta x\), see that

$$f(x) + \nabla f(x) \cdot (\alpha\nabla f(x) + n) \
\Longleftrightarrow
\alpha \approx \frac{f(B) – f(x)}{|\nabla f(x)|^2}.$$

Now we just take \(\Delta x = \frac{f(B) – f(x)}{|\nabla f(x)|^2}\nabla f(x) + \text{proj}_{\text{null}(\nabla f(x))} (B- x)\) for each position \(x\) on the path.

Results. Latent to expected volume.

Optimizing Vertical Laziness

Sometimes it is impossible to hope for the best case scenario. Even when \(f(A)=f(B)\), it might happen that the latent space is structured such that \(A\) and \(B\) are in different components of the level set of the function \(f\). Then there is no hope for a smooth hike without ups and downs! But the situation is not completely hopeless if we are fine with taking detours as long as such undesirables are minimized. We frame the problem as

$$\text{argmin}_{x_i \in L, \forall i} \sum_{i=1}^n |f(x_i) – f(B)|^2 + \lambda\sum_{i=1}^{n-2} |(x_{i+2}-x_{i+1}) – (x_{i+1} – x_i)|^2,
$$

where \(L\) is the latent space and \({x_i}_{i=1}^n\) is the sequence defining the path such that \(x_1 = A\) and \(x_n=B\). The first term is to encourage consistency in the function values, while the second term discourages sudden changes in curvature, thereby ensuring smoothness. Once defined, various gradient-based optimization algorithms can be used on this problem, which is now imposed with a relatively soft constraint.

Results of pathfinding via optimization in different contexts.

Latent to expected volume.
Latent to expected genus.

A Geodesic Path

One alternative attempt of note to optimize pathfinding used the concept of geodesics, or the shortest, smoothest distance between two points on a Riemannian manifold. In the latent space, one is really working over a latent data manifold, so thinking geometrically, we experimented with a geodesic path algorithm. In writing this algorithm, we set our two latent endpoints and optimized the points in between by minimizing the energy of our path on the output manifold. This alternative method worked similarly well to the optimized pathfinding algorithm posed above!

Conclusion and Future Work

Our journey into DeepSDFs, gradient methods, and geodesic paths has shown both the promise and the pitfalls of blending multiple specialized components. The decoupled design gave us flexibility, but also revealed a fragility—every part must share the same initialized latent space, and retraining one element (like the DeepSDF) forces a retraining of its companions, the regressor and classifier. With each component carrying its own error, the final solution sometimes felt like navigating with a slightly crooked compass.

Yet, these challenges are also opportunities. Regularization strategies such as Lipschitz constraints, eikonal terms, and Laplacian smoothing may help “tame” topological features, while alternative frameworks like diffeomorphic flows and persistent homology could provide more robust topological guarantees. The full integration of these models remains a work in progress, but we hope that this exploration sparks new ideas and gives you, the reader, a sharper sense of how topology can be preserved—and perhaps even mastered—in complex geometric learning systems.

Genus is a commonly known and widely applied shape feature in 3D computer graphics, but there is also a whole mathematical menu of topological invariants to explore along with their preservation techniques! Throughout our research, we also considered:

  • Would topological accuracy benefit from a non-differentiable algorithm (e.g. RRT) that can directly encode genus as a discrete property? How would we go about certifying such an algorithm?
  • How can homotopy continuation be used for latent space pathfinding? How will this method complement continuous deformations of homotopic and even non-homotopic shapes?
  • What are some use cases to preserve topology, and what choice of topological invariant should we pair with those cases?

We invite all readers and researchers to take into account our struggles and successes in considering the questions posed.

Acknowledgements. We thank Professor Kry for his mentorship, Daria Nogina and Yuanyuan Tao for many helpful huddles, Professor Solomon for his organization of great mentors and projects, and the sponsors of SGI for making this summer possible.

References

DeepSDF

SDFConnect

Categories
Research

Topology Control: Training a DeepSDF (1/2)

SGI Fellows:
Stephanie Atherton, Marina Levay, Ualibyek Nurgulan, Shree Singhi, Erendiro Pedro

In the Topology Control project mentored by Professor Paul Kry and project assistants Daria Nogina and Yuanyuan Tao, we sought to explore preserving topological invariants of meshes within the framework of DeepSDFs. Deep Signed Distance Functions are a neural implicit representation used for shapes in geometry processing, but they don’t come with the promise of respecting topology. After finishing our ML pipeline, we explored various topology-preserving techniques through our simple, initial case of deforming a “donut” (torus) into a mug.

DeepSDFs

Signed Distance Field (SDF) representation of a 3D bunny. The network predicts the signed distance from each spatial point to the surface. Source: (Park et al., 2019).

Signed Distance Functions (SDFs) return the shortest distance from any point in 3D space to the surface of an object. Their sign indicates spatial relation: negative if the point lies inside, positive if outside. The surface itself is defined implicitly as the zero-level set: the locus where \(\text{SDF}(x) = 0 \).

In 2019, Park et al. introduced DeepSDF, the first method to learn a continuous SDF directly using a deep neural network (Park et al., 2019). Given a shape-specific latent code \( z \in \mathbb{R}^d \) and a 3D point \( x \in \mathbb{R}^3 \), the network learns a continuous mapping:

$$
f_\theta(z_i, x) \approx \text{SDF}^i(x),
$$

where \( f_\theta \) takes a latent code \( z_i \) and a 3D query point \( x \) and returns an approximate signed distance.

The training set is defined as:

$$
X := {(x, s) : \text{SDF}(x) = s}.
$$

Training minimizes the clamped L1 loss between predicted and true distances:

$$
\mathcal{L}\bigl(f_\theta(x), s\bigr)
= \bigl|\text{clamp}\bigl(f_\theta(x), \delta\bigr) – \text{clamp}(s, \delta)\bigr|
$$

with the clamping function:

$$
\text{clamp}(x, \delta) = \min\bigl(\delta, \max(-\delta, x)\bigr).
$$

Clamping focuses the loss near the surface, where accuracy matters most. The parameter \( \delta \) sets the active range.

This is trained on a dataset of 3D point samples and corresponding signed distances. Each shape in the training set is assigned a unique latent vector \( z_i \), allowing the model to generalize across multiple shapes.

Once trained, the network defines an implicit surface through its decision boundary, precisely where \( f_\theta(z, x) = 0 \). This continuous representation allows smooth shape interpolation, high-resolution reconstruction, and editing directly in latent space.

Training Field Notes

We sampled training data from two meshes, torus.obj and mug.obj using a mix of blue-noise points near the surface and uniform samples within a unit cube. All shapes were volume-normalized to ensure consistent interpolation.

DeepSDF is designed to intentionally overfit. Validation is typically skipped. Effective training depends on a few factors: point sample density, network size, shape complexity, and sufficient epochs.

After training, the implicit surface can be extracted using Marching Cubes or Marching Tetrahedra to obtain a polygonal mesh from the zero-level set.

Training Parameters
SDF Delta1.0
Latent Mean0.0
Latent SD0.01
Loss FunctionClamped L1
OptimizerAdam
Network Learning Rate0.001
Latent Learning Rate0.01
Batch Size2
Epochs5000
Max Points per Shape3000
Network Architecture
Latent Dimension16
Hidden Layer Size124
Number of Layers8
Input Coordinate Dim3
Dropout0.0
Point Cloud Sampling
Radius0.02
Sigma0.02
Mu0.0
Number of Gaussians10
Uniform Samples5000

For higher shape complexity, increasing the latent dimension or training duration improves reconstruction fidelity.

Latent Space Interpolation

One compelling application is interpolation in latent space. By linearly blending between two shape codes \( z_a \) and \( z_b \), we generate new shapes along the path

$$
z(t) = (1 – t) \cdot z_a + t \cdot z_b,\quad t \in [0,1].
$$

Latent space interpolation between mug and torus.

While DeepSDF enables smooth morphing between shapes, it exposes a core limitation: a lack of topological consistency. Even when the source and target shapes share the same number of genus, interpolated shapes can exhibit unintended holes, handles, or disconnected components. These are not artifacts, they reveal that the model has no built-in notion of topology.

However, this limitation also opens the door for deeper exploration. If neural fields like DeepSDF lack an inherent understanding of topology, how can we guide them toward preserving it? In the next post, we explore a fundamental topological property—genus—and how maintaining it during shape transitions could lead us toward more structurally meaningful interpolations.

References

Categories
Research

Quasi-harmonic Geodesic Distance

I kicked off this project with my advisor, Yu Wang, over Zoom the weekend before our official start, checked in mid-week to ask questions and provide a progress report, and wrapped up the week with a final Zoom review—each meeting improving my understanding of the fundamental problem and how to improve my approach in Python. I’m excited to share these quasi-harmonic ideas of how a blend of PDE insight and mesh-based propagation can yield both speed and exactness in geodesic computation.

Finding the shortest path on a curved surface—called the geodesic distance—is deceptively challenging. Exact methods track how a wavefront would travel across every triangle of the mesh, which is accurate but can be painfully slow on complex shapes. The Heat Method offers a clever shortcut: imagine dropping a bit of heat at your source point, let it diffuse for a moment, then “read” the resulting temperature field to infer distances. By solving two linear systems—one for the heat spread and one to recover a distance‐like potential—you get a fast, global approximation. It runs in near-linear time and parallelizes beautifully, but it can smooth over sharp creases and slightly misestimate in highly curved regions.

To sharpen accuracy where it matters, I adopted a hybrid strategy. First, detect “barrier” edges—those sharp creases where two faces meet at a steep dihedral angle—and temporarily slice the mesh along them. Then apply the Heat Method independently on each nearly‐flat patch, pinning the values along the cut edges to their true geodesic distances. Finally, stitch everything back together by running a precise propagation step only along those barrier edges. The result is a distance field that retains the Heat Method’s speed and scalability on smooth regions, yet achieves exactness along critical creases. It’s remarkable how something as seemingly simple as measuring distance on a surface can lead into rich territory—mixing partial-differential equations, sparse linear algebra, and discrete geometry in pursuit of both efficiency and precision.

Categories
Research

From Calculus to \(k\)-forms

Differential geometry makes heavy use of calculus in order to study geometric objects such as smooth manifolds. Just as differential geometry builds upon calculus and other fundamentals, the notion of differential forms extend the approaches of differential geometry and calculus, even allowing us to study the larger class of surfaces that are not necessarily smooth. We will motivate from calculus a brief brush with differential forms and their geometric potential.

Review of Vector Fields

Recall a vector field is a special case of a multivariable vector-valued function (taking in a scalar or a vector and outputting a set of multidimensional vectors). Often they are defined in \(\mathbb{R}^2\) or \(\mathbb{R}^3\), but can also be defined more generally on smooth manifolds.

In three dimensional space, a vector field is a function \(V\) that assigns to each point \((x, y, z)\) a three dimensional vector given by \(V(x, y, z)\). Consider our identity function

$$ V(x, y, z) = \begin{bmatrix}x\\y\\z \end{bmatrix}, $$

where for each point \((x, y, z)\) we associate a vector with those same coordinates. Generalizing, for any vector function \(V(x, y, z)\), we visualize its vector field by placing the tail of the corresponding vector \(V(x_0, y_0, z_0)\) at its point \((x_0, y_0, z_0)\) in the domain of \(V(x, y, z)\).

Here we consider fluid particles (displayed as blue dots) as points, each with its own magnitude, direction, and velocity that correspond to overall fluid flow. Then to each point in our sample of particles we can assign a vector with speed of velocity as its length. Each arrow not only represents the velocity of the individual particle it is attached to, but also takes into account how the neighborhood of particles around it move.

Vector fields can be transformed into scalar fields via \(\nabla\) taken as a vector differentiable operator known as the gradient operator.

Our gradient points in the direction of steepest ascent of a scalar function \(f(x_1, x_2, x_3, . . . x_n)\), indicating the direction of greatest change of \(f\). A vector field \(V\) over an open set \(S\) is a gradient field if there exists a differentiable real-valued function (the scalar field) \(f\) on \(S\). Then

$$V = \nabla f = ( \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \frac{\partial f}{\partial x_3}, . . . \frac{\partial f}{\partial x_n})$$

is a special type of vector field \(V\) defined as the gradient of a scalar function, where each \( v \in V\) is the gradient of \(f\) at that point.

The Divergence and Curl Operators

Now that we have reviewed vector fields, recall that we can also perform operations like divergence and curl over them. When fluid flows over time, some regions of fluid will become less dense with particles as they flow away (negative divergence). Meanwhile, particles tending towards each other will cause the fluid in that region to be more dense (positive divergence).

Luckily, we have our handy divergence operator to take in the vector-valued function of our vector field and output a scalar-valued function measuring the change in density of the fluid around each point. Given a vector field \(V\), we can think of divergence

$$\text{div}\space V = \nabla \cdot V = \frac{\partial V_1}{\partial x} + \frac{\partial V_2}{\partial y}+ \frac{\partial V_3}{\partial z} . . . $$

as a variation of the derivative that generalizes to vector fields \(V\) of any dimension. Another “trick” to thinking about the divergence formula is as the trace of the Jacobian matrix of the vector field.

In differential geometry, one can also define divergence on a Riemannian manifold \((M, g)\), a smooth manifold \(M\) endowed with the Riemannian metric \(g\) as a choice of inner product for each tangent space of \(M\).

The Riemannian metric allows us to take the inner product of the two vectors tangent to the sphere.
Every smooth manifold admits a Riemannian metric.

Whereas divergence describes changes in density, curl measures the “rotation”, or angular momentum, of the vector field \(V\). Likewise, the curl operator takes in a vector-valued function and outputs a function that describes the flow of rotation given by the vector field at each point.

Here, all the points go in circles in the counterclockwise direction, which we define by \(V(x, y) = \begin{bmatrix} -y \\ x \end{bmatrix} = -y\hat{i} + x\hat{j}\).

For a vector field function

$$V(x, y) = v_1(x, y)\hat{i} + v_2(x, y)\hat{j},$$

we can look at the change of a point \((x_0, y_0)\) by looking at the change in \(v_1\) as \(y\) changes and \(v_2\) as \(x\) changes. We calculate

$$2\text{d curl} \space V = \nabla \times V = \frac{\partial v_2}{\partial x}(x_0, y_0) \space – \frac{\partial v_1}{y}(x_0, y_0),$$

with a positive evaluation indicating counterclockwise rotation around \((x_0, y_0)\) and a negative evaluation indicating clockwise.

Approach by Differential Forms

Vector calculus can also be viewed more generally and geometrically via the theory of differential forms.

While curl is typically only defined in two and three dimensions, it can be generalized in the context of differential forms. The calculus of differential \(k\)-forms offers a unified approach to define integrands over curves, surfaces, and higher-dimensional manifolds. Differential 1-forms of expression form

$$\omega = f(x)dx$$

are defined on an open interval \( U \subseteq \mathbb{R^n}\) with \(f: U \to \mathbb{R}\) a smooth function. We can have 1-forms

$$\omega = f(x, y)dx + g(x, y)dy$$

defined on an open subset \( U \subseteq \mathbb{R^2}\) ,

$$\omega = f(x, y, z)dx + g(x, y, z)dy + h(x, y, z)dz $$

defined on an open subset \( U \subseteq \mathbb{R^3}\), and so on for any positive integer \(n\).

Correspondence Theorem: Given a 1-form \(\omega = f_1dx_1 + f_2dx_2 + . . . f_ndx_n\), we can define an associated vector field with component functions \((f_1, f_2, . . . f_n)\). Conversely, given a smooth vector field \(V = (f_1, f_2, . . . f_n)\), we can define an associated 1-form \(\omega = f_1dx_1 + f_2dx_2 + . . . f_ndx_n\).

Via the exterior derivative and an extension of the correspondence between 1-forms and vector fields to arbitrary \(k\)-forms, we can generalize the well-known operators of vector calculus.

The exterior derivative \(d\omega\) of a \(k\)-form is simply a \(k + 1\)-form. We provide an example of its use in defining curl:

Let \(\omega = f(x, y, z)dx + g(x, y, z)dy + h(x, y, z)dz\) be a 1-form on \( U \subseteq \mathbb{R}^3\) with its associated vector field \(V = (f, g, h)\). Its exterior derivative \(d\omega\) is the 2-form

$$d\omega = (\frac{\partial h}{\partial y} – \frac{\partial g}{\partial z}) dy \wedge dz + (\frac{\partial f}{\partial z} – \frac{\partial h}{\partial x}) dz \wedge dx + (\frac{\partial g}{\partial x} – \frac{\partial f}{\partial y}) dx \wedge dy,$$

where \(\wedge\) is an associative operation known as the exterior product. We define the curl of \(V\)

$$ \nabla \times V = (\frac{\partial h}{\partial y} – \frac{\partial g}{\partial z}), (\frac{\partial f}{\partial z} – \frac{\partial h}{\partial x}), (\frac{\partial g}{\partial x} – \frac{\partial f}{\partial y})$$

as the vector field associated to the 2-form \(d\omega\). Attempt to see how this definition of curl agrees with the traditional version, but can be applied in higher dimensions.

The natural pairing between vector fields and 1-forms allows for the modern approach of \(k\)-forms for multivariable calculus and has also heavily influenced the area of geometric measure theory (GMT), where we define currents as functionals on the space of \(k\)-forms. Via currents, we have the impressive result that a surface can be uniquely defined by the result of integrating differential forms over it. Further, the differential forms fundamental to GMT open up our toolbox to studying geometry that is not necessarily smooth!

Acknowledgements. We thank Professor Chien, Erick Jiminez Berumen, and Matheus Araujo for exposing us to new math, questions, and insights, as well as for their mentorship and dedication throughout the “Developing a Medial Axis for Measures” project. We also acknowledge Professor Solomon for suggesting edits incorporated into this post, as well as for his organization of the wonderful summer program in the first place!

References

Categories
Research

A Derivation of the Force Density Method

Thanks to Ioannis Mirtsopoulos, Elshadai Tegegn, Diana Avila, Jorge Rico Esteban, and Tinsae Tadesse for invaluable assistance with this article.

A truss is a collection of rigid beams connected together with joints which can rotate, but not slip by each other. Trusses are used throughout engineering, architecture, and other structural design; you’ll see them in railway bridges, supporting ceilings, and forming the foundation for many older roller coasters. When designing trusses, it is often necessary to design for static equilibrium; that is, such that the forces at each joint in the truss sum to zero, and the truss
does not move. One of the most effective ways to accomplish this is by parameterizing our graphs through the ratios between the axial force along each edge and the length of that edge. Our goal for this article is to show that these force-densities parametrize graphs in static equilibrium, subject to certain constraints.

The first constraint we will place upon our system is topological. Recall that a graph is a collection of vertices, together with a collection of edges linking pairs of those vertices. We will fix, once and for all, a graph \(G\) to represent the topology of our truss. We will also designate some of the vertices of our graph as fixed vertices and the rest as free vertices. The fixed vertices represent vertices which are attached to some solid medium external to the truss, and so experience normal forces which cancel out any forces which are present from the loading of the truss. In addition to these fixed vertices, to each free vertex we assign a load, which represents the force placed on that point.

First we introduce the incidence matrix of a (directed)
graph. Recall that a directed graph where the edges have a destinguished ‘first’ vertex; that is, the edges are represented by ordered pairs rather than sets. Given such a graph, the incidence matrix of the graph is the matrix having in the \(i,j\)-th position a 1 if the \(i\)-th edge begins with the
\(j\)-th vertex, a \(-1\) if the \(i\)-th edge ends with the \(j\)-th vertex, and a 0 otherwise. We split the entire adjacency matrix \(A\) of our graph \(G\) into two matrices, \(C\) and \(C_f\), where \(C\) consists of the columns of \(A\) which correspond to free vertices, and \(C_f\) consists of those columns corresponding to fixed vertices. We will also work with the vectors \(p_x\) and \(p_y\) of forces, and
the vectors \(x_f\) and \(y_f\) of equilibrium points. We wish to construct a map \[F:\mathscr Q \to \mathscr C\]
where \(\mathscr Q\) is the space of vectors \(q = (q_1, …, q_n)\) of force-length ratios, and \(\mathscr C\) is the space of all possible lists of coordinate pairs \((x, y) = (x_1, …, x_n, y_1,
…, y_n)\). It turns out that the map we want is \[F(q) = \left(D^{-1}(p_x – D_fx_f), D^{-1}(p_y – D_fy_f)\right)\]
Where we set \(D = C^TQC, D_f = C^TQC_f\), and \(Q\) is the diagonal matrix with the \(q_i\) along the diagonal. Clearly, this map is a rational map in the \(q_i\), and is regular away from where \(\det(D) = 0\) (when viewed as a function of \(q\)).

Derivation

We suppose we have a system with free coordinate vectors \((x, y) = (x_1, …, x_n, y_1, …, y_n)\) and fixed coordinate vectors \((x_f, y_f) = (x_{1f}, …, y_{1f}, …)\) which is
in static equilibrium. First note that we can easily obtain the vectors of coordinate differences \(u\) and \(v\) (in the \(x\) and \(y\) directions respectively) using the incidence matrices \(C\) and \(C_f\).
\[u = Cx + C_fx_f\]
\[v = Cy + C_fy_f\]
We form the diagonal matrices \(U\) and \(V\) corresponding to \(u\) and \(v\), letting \(L\) be the matrix with the lengths of the edges down the diagonal and \(s\) the vector of axial forces along each edge. We note that if we divide the axial force along each edge by that edge’s length, and then multiply by the length in the \(x\) or \(y\) direction, we have then calculated the force that edge contributes to each vertex at it’s ends. Multiplying on the left by the transpose of the incedence matrix \(C\) then gives the total force on the vertex from all the bars coincident with it, which must then be
equal to the external load applied. This yields the equations
\[C^TUL^{-1}s = p_x,\ \ \ C^TVL^{-1}s = p_y,\]
which hold if and only if the system is in equilibrium. We define \(q=L^{-1}s\) to be the vector of force densities, simplifying our equations to
\[C^TUq = p_x,\ \ \ C^TVq = p_y.\]
We note that \(Uq = Qu\) and \(Vq=Qv\), by the definition of matrix multiplication (each matrix is a diagonal matrix!). Thus we obtain by the definitions of \(u\) and \(v\)
\[p_x = C^TUq = C^TQ(Cx+C_fx_f) = C^TQCx+C^TQC_fx_f, \]
\[p_y = C^TVq = C^TQ(Cy+C_fy_f) = C^TQCy+C^TQC_fy_f.\]
We set, as above, \(D = C^TQC\) and \(D_f = C^TQC_f\), yielding
\[p_x – D_fx_f= Dx , \]
\[p_y – D_fy_f= Dy ,\]
or as promised
\[D^{-1}(p_x – D_fx_f)=x ,\]
\[D^{-1}(p_y – D_fy_f)=y .\]
Clearly these equations hold if and only if the system is in equilibrium. Since every system (in equilibrium or not) has force densities, this yields an important corallary: Every system which is in equilibrium is of the form \(F(q)\) for some choice of force-density ratios \(q\), and no system which is not in equilibrium is of such a form.

It turns out to be incredibly useful to have this method to parametrize the set of all graphs in static equilibrium. This method is used to calculate the deformation of nets, cloth, and to design load-bearing trusses; it is also used to generate trusses based on particular design parameters. In
a later article we will also describe how one can use FDM to find an explicit representation for the set of possible point configurations which are in static equilibrium.

Categories
Research

Hidden Quivers: Supporting the Manifold Hypothesis

Quivers are a tool that are known to help us simplify problems in math. In particular, representations of quivers contribute to geometric perspectives in representation theory: the theory of reducing complex algebraic structures to simpler ones. Lesser known, neural networks can also be represented using quiver representation theory.

Fundamentally, a quiver is just a directed graph.

Intrinsic definitions to consider include:

  • A source vertex of a quiver has no edges directed towards it
  • A sink vertex has no edges directed away from it
  • A loop in a quiver is an oriented edge such that the start vertex is the same as the end vertex

A fancy type of quiver known as an Auslander-Reiten quiver, courtesy of the author. But remember!, a quiver is simply a directed graph.

Just like an MLP, a network quiver \(Q\) is arranged by input, output, and hidden layers in between. Likewise, they also have input vertices (a subset of source vertices), bias vertices (the source vertices that are not input vertices), and output vertices (sinks of \(Q\)). All remaining vertices are hidden vertices. The hidden quiver \(\tilde{Q}\) consists of all hidden vertices \(\tilde{V}\) of \(Q\) and all oriented edges \(\tilde{E}\) between \(\tilde{V}\) of \(Q\) that are not loops.

Def: A network quiver \(Q\) is a quiver arranged by layers such that:

  1. There are no loops on source (input and bias) nor sink vertices.
  2. There exists exactly one loop on each hidden vertex

For any quiver \(Q\), we can also define its representation \(\mathcal{Q}\), in which we assign a vector space to each vertex of \(Q\) and regard our directed edges of \(Q\) as \(k\)-linear maps. In a thin representation, each \(k\)-linear map is simply a \(1\times1\) matrix.

A quiver with 4 vertices, courtesy of the author.
A representation of the quiver directly above, courtesy of the author.

Defining a neural network \((W, f)\) over a network quiver \(Q\), where \(W\) is a specific thin representation and \(f = (f_v)_{v \in V}\) are activation functions, allows much of the language and ideas of quiver representation theory to carry over to neural networks .

A neural network over a network quiver.

When a neural network like an MLP does its forward pass, it gives rise to a pointwise activation function \(f\), defined here as a one variable non-linear function \(f: \mathbb{C} \to \mathbb{C}\) differentiable except in a set of measure zero. We assign these activation functions to loops of \(Q\).

Further, for a neural network \((W, f)\) over \(Q\), we have a network function

$$ \Psi(W, f): \mathbb{C}^d \to \mathbb{C}^k $$

where the coordinates of \(\Psi(W, f)(x)\) are the score of the neural net as the activation outputs of the output vertices of \((W, f)\) with respect to an input data vector \(x \in \mathbb{C}^d\).

The manifold hypothesis critical to deep learning proposes that high-dimensional data actually lies in a low-dimensional, latent manifold within the input space. We can map the input space to the geometric moduli space of neural networks \(_d\mathcal{M}_k(\tilde{Q})\) so that our latent manifold is also translated to the moduli space. While \(_d\mathcal{M}_k(\tilde{Q})\) depends on the combinatorial structure of the neural network, activation and weight architectures of the neural network determine how data is distributed inside the moduli space.

A three-dimensional data manifold.

We will approach the manifold hypothesis via framed quiver representations. A choice of a thin representation \(\tilde{\mathcal{Q}}\) of the hidden quiver \(\tilde{Q}\) and a map \(h\) from the hidden representation \(\tilde{\mathcal{Q}}\) to hidden vertices determine a pair \((\tilde{\mathcal{Q}}, h)\), where \(h = \{h_v\}{v \in \tilde{V}}\). The pair \((\tilde{\mathcal{Q}}, h)\) is used to denote our framed quiver representation.

Def: A double-framed thin quiver representation is a triple \((l, \tilde{\mathcal{Q}}, h)\) where:

  • \(\tilde{\mathcal{Q}}\) is a thin representation of the hidden quiver \(\tilde{Q}\)
  • \((\tilde{\mathcal{Q}}, h)\) is framed representation of \(\tilde{Q}\)
  • \((\tilde{\mathcal{Q}}, l)\) is a co-framed representation of \(\tilde{Q}\) (the dual of a framed representation)

Denote by \(_d\mathcal{R}_k(\tilde{\mathcal{Q}})\) the space of all double-framed thin quiver representations. We will use stable double-framed thin quiver representations in our construction of moduli space.

Def: A double-framed thin quiver representation \(\texttt{W}_k^f = (l, \tilde{\mathcal{Q}}, h)\) is stable if :

  1. The only sub-representation of \(\tilde{\mathcal{Q}}\) contained in the kernel of \(h\) is the zero sub-representation
  2. The only sub-representation of \(\tilde{\mathcal{Q}}\) contained in the image of \(l\) is \(\tilde{\mathcal{Q}}\)

Def: We present the moduli space of double-framed thin quiver representations as

$$ _d\mathcal{M}_k(\tilde{Q}):=\{[V]: _d\mathcal{R}_k(\tilde{\mathcal{Q}}) \space \text{is stable} \}. $$

The moduli space depends on the hidden quiver as well as the chosen vector spaces. Returning to neural networks \((W, f)\), and given an input data vector \(x \in \mathbb{C}^d\), we can define a map

$$ \varphi(W, f): \mathbb{C}^d \to _d\mathcal{R}_k(\tilde{\mathcal{Q}})\\x \mapsto \texttt{W}_k^f. $$

This map takes values in the moduli space, the points of which parametrize isomorphism classes of stable double-framed thin quiver representations. Thus we have

$$ \varphi(W, f): \mathbb{C}^d \to _d\mathcal{M}_k(\tilde{Q}).
$$

As promised, we have mapped our input space containing our latent manifold to the moduli space \(_d\mathcal{M}_k(\tilde{Q})\) of neural networks, mathematically validating the manifold hypothesis.

Independent of the architecture, activation function, data, or task, any decision of any neural network passes through the moduli (as well as representation) space. With our latent manifold translated into the moduli space, we have an algebro-geometric way to continue to study the dynamics of neural network training.

Looking through the unsuspecting the lens of quiver representation theory has the potential to provide new insights in deep learning, where network quivers appear as a combinatorial tool for understanding neural networks and their moduli spaces. More concretely:

  • Continuity and differentiability of the network function \(\Psi(W, f)\) and map \(\varphi(W, f)\) should allow us to apply further algebro-geometric tools to the study of neural networks, including to our constructed moduli space \(_d\mathcal{M}_k(\tilde{Q})\).
  • Hidden quivers can aid us in comprehending optimization hyperparameters in deep learning. We may be able to transfer gradient descent optimization to the setting of the moduli space.
  • Studying training within moduli spaces can lead to the development of new convergence theorems to guide deep learning.
  • The dimension of \(_d\mathcal{M}_k(\tilde{Q})\) could be used to quantify the capacity of neural networks.

The manifold hypothesis has played a ubiquitous role throughout deep learning since originally posed, and formalizing its existence via the moduli of quiver representations can help us to understand and potentially improve upon the effectiveness of neural networks and their latent spaces.

Notes and Acknowledgements. Content for this post was largely borrowed from and inspired by The Representation Theory of Neural Networks, smoothing over many details more rigorously presented in the original paper. We thank the 2025 SGI organizers and sponsors for supporting the author’s first deep learning-related research experience via the “Topology Control” project as well as mentors and other research fellows involved for their diverse expertise and patience.

Author