I kicked off this project with my advisor, Yu Wang, over Zoom the weekend before our official start, checked in mid-week to ask questions and provide a progress report, and wrapped up the week with a final Zoom review—each meeting improving my understanding of the fundamental problem and how to improve my approach in Python. I’m excited to share these quasi-harmonic ideas of how a blend of PDE insight and mesh-based propagation can yield both speed and exactness in geodesic computation.
Finding the shortest path on a curved surface—called the geodesic distance—is deceptively challenging. Exact methods track how a wavefront would travel across every triangle of the mesh, which is accurate but can be painfully slow on complex shapes. The Heat Method offers a clever shortcut: imagine dropping a bit of heat at your source point, let it diffuse for a moment, then “read” the resulting temperature field to infer distances. By solving two linear systems—one for the heat spread and one to recover a distance‐like potential—you get a fast, global approximation. It runs in near-linear time and parallelizes beautifully, but it can smooth over sharp creases and slightly misestimate in highly curved regions.
To sharpen accuracy where it matters, I adopted a hybrid strategy. First, detect “barrier” edges—those sharp creases where two faces meet at a steep dihedral angle—and temporarily slice the mesh along them. Then apply the Heat Method independently on each nearly‐flat patch, pinning the values along the cut edges to their true geodesic distances. Finally, stitch everything back together by running a precise propagation step only along those barrier edges. The result is a distance field that retains the Heat Method’s speed and scalability on smooth regions, yet achieves exactness along critical creases. It’s remarkable how something as seemingly simple as measuring distance on a surface can lead into rich territory—mixing partial-differential equations, sparse linear algebra, and discrete geometry in pursuit of both efficiency and precision.
Differential geometry makes heavy use of calculus in order to study geometric objects such as smooth manifolds. Just as differential geometry builds upon calculus and other fundamentals, the notion of differential forms extend the approaches of differential geometry and calculus, even allowing us to study the larger class of surfaces that are not necessarily smooth. We will motivate from calculus a brief brush with differential forms and their geometric potential.
Review of Vector Fields
Recall a vector field is a special case of a multivariable vector-valued function (taking in a scalar or a vector and outputting a set of multidimensional vectors). Often they are defined in \(\mathbb{R}^2\) or \(\mathbb{R}^3\), but can also be defined more generally on smooth manifolds.
In three dimensional space, a vector field is a function \(V\) that assigns to each point \((x, y, z)\) a three dimensional vector given by \(V(x, y, z)\). Consider our identity function
$$ V(x, y, z) = \begin{bmatrix}x\\y\\z \end{bmatrix}, $$
where for each point \((x, y, z)\) we associate a vector with those same coordinates. Generalizing, for any vector function \(V(x, y, z)\), we visualize its vector field by placing the tail of the corresponding vector \(V(x_0, y_0, z_0)\) at its point \((x_0, y_0, z_0)\) in the domain of \(V(x, y, z)\).
Here we consider fluid particles (displayed as blue dots) as points, each with its own magnitude, direction, and velocity that correspond to overall fluid flow. Then to each point in our sample of particles we can assign a vector with speed of velocity as its length. Each arrow not only represents the velocity of the individual particle it is attached to, but also takes into account how the neighborhood of particles around it move.
Vector fields can be transformed into scalar fields via \(\nabla\) taken as a vector differentiable operator known as the gradient operator.
Our gradient points in the direction of steepest ascent of a scalar function \(f(x_1, x_2, x_3, . . . x_n)\), indicating the direction of greatest change of \(f\). A vector field \(V\) over an open set \(S\) is agradient fieldif there exists a differentiable real-valued function (the scalar field) \(f\) on \(S\). Then
is a special type of vector field \(V\) defined as the gradient of a scalar function, where each \( v \in V\) is the gradient of \(f\) at that point.
The Divergence and Curl Operators
Now that we have reviewed vector fields, recall that we can also perform operations like divergence and curl over them. When fluid flows over time, some regions of fluid will become less dense with particles as they flow away (negative divergence). Meanwhile, particles tending towards each other will cause the fluid in that region to be more dense (positive divergence).
Luckily, we have our handy divergence operator to take in the vector-valued function of our vector field and output a scalar-valued function measuring the change in density of the fluid around each point. Given a vector field \(V\), we can think of divergence
as a variation of the derivative that generalizes to vector fields \(V\) of any dimension. Another “trick” to thinking about the divergence formula is as the trace of the Jacobian matrix of the vector field.
In differential geometry, one can also define divergence on a Riemannian manifold \((M, g)\), a smooth manifold \(M\) endowed with the Riemannian metric \(g\) as a choice of inner product for each tangent space of \(M\).
The Riemannian metric allows us to take the inner product of the two vectors tangent to the sphere.
Whereas divergence describes changes in density, curl measures the “rotation”, or angular momentum, of the vector field \(V\). Likewise, the curl operator takes in a vector-valued function and outputs a function that describes the flow of rotation given by the vector field at each point.
Here, all the points go in circles in the counterclockwise direction, which we define by \(V(x, y) = \begin{bmatrix} -y \\ x \end{bmatrix} = -y\hat{i} + x\hat{j}\).
we can look at the change of a point \((x_0, y_0)\) by looking at the change in \(v_1\) as \(y\) changes and \(v_2\) as \(x\) changes. We calculate
$$2\text{d curl} \space V = \nabla \times V = \frac{\partial v_2}{\partial x}(x_0, y_0) \space – \frac{\partial v_1}{y}(x_0, y_0),$$
with a positive evaluation indicating counterclockwise rotation around \((x_0, y_0)\) and a negative evaluation indicating clockwise.
Approach by Differential Forms
Vector calculus can also be viewed more generally and geometrically via the theory of differential forms.
While curl is typically only defined in two and three dimensions, it can be generalized in the context of differential forms. The calculus of differential \(k\)-forms offers a unified approach to define integrands over curves, surfaces, and higher-dimensional manifolds. Differential 1-forms of expression form
$$\omega = f(x)dx$$
are defined on an open interval \( U \subseteq \mathbb{R^n}\) with \(f: U \to \mathbb{R}\) a smooth function. We can have 1-forms
$$\omega = f(x, y)dx + g(x, y)dy$$
defined on an open subset \( U \subseteq \mathbb{R^2}\) ,
$$\omega = f(x, y, z)dx + g(x, y, z)dy + h(x, y, z)dz $$
defined on an open subset \( U \subseteq \mathbb{R^3}\), and so on for any positive integer \(n\).
Correspondence Theorem: Given a 1-form \(\omega = f_1dx_1 + f_2dx_2 + . . . f_ndx_n\), we can define an associated vector field with component functions \((f_1, f_2, . . . f_n)\). Conversely, given a smooth vector field \(V = (f_1, f_2, . . . f_n)\), we can define an associated 1-form \(\omega = f_1dx_1 + f_2dx_2 + . . . f_ndx_n\).
Via the exterior derivative and an extension of the correspondence between 1-forms and vector fields to arbitrary \(k\)-forms, we can generalize the well-known operators of vector calculus.
The exterior derivative \(d\omega\) of a \(k\)-form is simply a \(k + 1\)-form. We provide an example of its use in defining curl:
Let \(\omega = f(x, y, z)dx + g(x, y, z)dy + h(x, y, z)dz\) be a 1-form on \( U \subseteq \mathbb{R}^3\) with its associated vector field \(V = (f, g, h)\). Its exterior derivative \(d\omega\) is the 2-form
as the vector field associated to the 2-form \(d\omega\). Attempt to see how this definition of curl agrees with the traditional version, but can be applied in higher dimensions.
The natural pairing between vector fields and 1-forms allows for the modern approach of \(k\)-forms for multivariable calculus and has also heavily influenced the area of geometric measure theory(GMT), where we define currents as functionals on the space of \(k\)-forms. Via currents, we have the impressive result that a surface can be uniquely defined by the result of integrating differential forms over it. Further, the differential forms fundamental to GMT open up our toolbox to studying geometry that is not necessarily smooth!
Acknowledgements. We thank Professor Chien, Erick Jiminez Berumen, and Matheus Araujo for exposing us to new math, questions, and insights, as well as for their mentorship and dedication throughout the “Developing a Medial Axis for Measures” project. We also acknowledge Professor Solomon for suggesting edits incorporated into this post, as well as for his organization of the wonderful summer program in the first place!
Thanks to Ioannis Mirtsopoulos, Elshadai Tegegn, Diana Avila, Jorge Rico Esteban, and Tinsae Tadessefor invaluable assistance with this article.
A truss is a collection of rigid beams connected together with joints which can rotate, but not slip by each other. Trusses are used throughout engineering, architecture, and other structural design; you’ll see them in railway bridges, supporting ceilings, and forming the foundation for many older roller coasters. When designing trusses, it is often necessary to design for static equilibrium; that is, such that the forces at each joint in the truss sum to zero, and the truss does not move. One of the most effective ways to accomplish this is by parameterizing our graphs through the ratios between the axial force along each edge and the length of that edge. Our goal for this article is to show that these force-densities parametrize graphs in static equilibrium, subject to certain constraints.
The first constraint we will place upon our system is topological. Recall that a graph is a collection of vertices, together with a collection of edges linking pairs of those vertices. We will fix, once and for all, a graph \(G\) to represent the topology of our truss. We will also designate some of the vertices of our graph as fixed vertices and the rest as free vertices. The fixed vertices represent vertices which are attached to some solid medium external to the truss, and so experience normal forces which cancel out any forces which are present from the loading of the truss. In addition to these fixed vertices, to each free vertex we assign a load, which represents the force placed on that point.
First we introduce the incidence matrix of a (directed) graph. Recall that a directed graph where the edges have a destinguished ‘first’ vertex; that is, the edges are represented by ordered pairs rather than sets. Given such a graph, the incidence matrix of the graph is the matrix having in the \(i,j\)-th position a 1 if the \(i\)-th edge begins with the \(j\)-th vertex, a \(-1\) if the \(i\)-th edge ends with the \(j\)-th vertex, and a 0 otherwise. We split the entire adjacency matrix \(A\) of our graph \(G\) into two matrices, \(C\) and \(C_f\), where \(C\) consists of the columns of \(A\) which correspond to free vertices, and \(C_f\) consists of those columns corresponding to fixed vertices. We will also work with the vectors \(p_x\) and \(p_y\) of forces, and the vectors \(x_f\) and \(y_f\) of equilibrium points. We wish to construct a map \[F:\mathscr Q \to \mathscr C\] where \(\mathscr Q\) is the space of vectors \(q = (q_1, …, q_n)\) of force-length ratios, and \(\mathscr C\) is the space of all possible lists of coordinate pairs \((x, y) = (x_1, …, x_n, y_1, …, y_n)\). It turns out that the map we want is \[F(q) = \left(D^{-1}(p_x – D_fx_f), D^{-1}(p_y – D_fy_f)\right)\] Where we set \(D = C^TQC, D_f = C^TQC_f\), and \(Q\) is the diagonal matrix with the \(q_i\) along the diagonal. Clearly, this map is a rational map in the \(q_i\), and is regular away from where \(\det(D) = 0\) (when viewed as a function of \(q\)).
Derivation
We suppose we have a system with free coordinate vectors \((x, y) = (x_1, …, x_n, y_1, …, y_n)\) and fixed coordinate vectors \((x_f, y_f) = (x_{1f}, …, y_{1f}, …)\) which is in static equilibrium. First note that we can easily obtain the vectors of coordinate differences \(u\) and \(v\) (in the \(x\) and \(y\) directions respectively) using the incidence matrices \(C\) and \(C_f\). \[u = Cx + C_fx_f\] \[v = Cy + C_fy_f\] We form the diagonal matrices \(U\) and \(V\) corresponding to \(u\) and \(v\), letting \(L\) be the matrix with the lengths of the edges down the diagonal and \(s\) the vector of axial forces along each edge. We note that if we divide the axial force along each edge by that edge’s length, and then multiply by the length in the \(x\) or \(y\) direction, we have then calculated the force that edge contributes to each vertex at it’s ends. Multiplying on the left by the transpose of the incedence matrix \(C\) then gives the total force on the vertex from all the bars coincident with it, which must then be equal to the external load applied. This yields the equations \[C^TUL^{-1}s = p_x,\ \ \ C^TVL^{-1}s = p_y,\] which hold if and only if the system is in equilibrium. We define \(q=L^{-1}s\) to be the vector of force densities, simplifying our equations to \[C^TUq = p_x,\ \ \ C^TVq = p_y.\] We note that \(Uq = Qu\) and \(Vq=Qv\), by the definition of matrix multiplication (each matrix is a diagonal matrix!). Thus we obtain by the definitions of \(u\) and \(v\) \[p_x = C^TUq = C^TQ(Cx+C_fx_f) = C^TQCx+C^TQC_fx_f, \] \[p_y = C^TVq = C^TQ(Cy+C_fy_f) = C^TQCy+C^TQC_fy_f.\] We set, as above, \(D = C^TQC\) and \(D_f = C^TQC_f\), yielding \[p_x – D_fx_f= Dx , \] \[p_y – D_fy_f= Dy ,\] or as promised \[D^{-1}(p_x – D_fx_f)=x ,\] \[D^{-1}(p_y – D_fy_f)=y .\] Clearly these equations hold if and only if the system is in equilibrium. Since every system (in equilibrium or not) has force densities, this yields an important corallary: Every system which is in equilibrium is of the form \(F(q)\) for some choice of force-density ratios \(q\), and no system which is not in equilibrium is of such a form.
It turns out to be incredibly useful to have this method to parametrize the set of all graphs in static equilibrium. This method is used to calculate the deformation of nets, cloth, and to design load-bearing trusses; it is also used to generate trusses based on particular design parameters. In a later article we will also describe how one can use FDM to find an explicit representation for the set of possible point configurations which are in static equilibrium.
Quivers are a tool that are known to help us simplify problems in math. In particular, representations of quivers contribute to geometric perspectives in representation theory: the theory of reducing complex algebraic structures to simpler ones. Lesser known, neural networks can also be represented using quiver representation theory.
Fundamentally, a quiver is just a directed graph.
Intrinsic definitions to consider include:
A source vertex of a quiver has no edges directed towards it
A sink vertex has no edges directed away from it
A loop in a quiver is an oriented edge such that the start vertex is the same as the end vertex
A fancy type of quiver known as an Auslander-Reiten quiver, courtesy of the author. But remember!, a quiver is simply a directed graph.
Just like an MLP, a network quiver \(Q\) is arranged by input, output, and hidden layers in between. Likewise, they also have input vertices (a subset of source vertices), bias vertices (the source vertices that are not input vertices), and output vertices (sinks of \(Q\)). All remaining vertices are hidden vertices. The hidden quiver \(\tilde{Q}\) consists of all hidden vertices \(\tilde{V}\) of \(Q\) and all oriented edges \(\tilde{E}\) between \(\tilde{V}\) of \(Q\) that are not loops.
Def: A network quiver \(Q\) is a quiver arranged by layers such that:
There are no loops on source (input and bias) nor sink vertices.
There exists exactly one loop on each hidden vertex
For any quiver \(Q\), we can also define its representation \(\mathcal{Q}\), in which we assign a vector space to each vertex of \(Q\) and regard our directed edges of \(Q\) as \(k\)-linear maps. In a thin representation, each \(k\)-linear map is simply a \(1\times1\) matrix.
A quiver with 4 vertices, courtesy of the author.
A representation of the quiver directly above, courtesy of the author.
Defining a neural network \((W, f)\) over a network quiver \(Q\), where \(W\) is a specific thin representation and \(f = (f_v)_{v \in V}\) are activation functions, allows much of the language and ideas of quiver representation theory to carry over to neural networks .
When a neural network like an MLP does its forward pass, it gives rise to a pointwise activation function \(f\), defined here as a one variable non-linear function \(f: \mathbb{C} \to \mathbb{C}\) differentiable except in a set of measure zero. We assign these activation functions to loops of \(Q\).
Further, for a neural network \((W, f)\) over \(Q\), we have a network function
$$ \Psi(W, f): \mathbb{C}^d \to \mathbb{C}^k $$
where the coordinates of \(\Psi(W, f)(x)\) are the score of the neural net as the activation outputs of the output vertices of \((W, f)\) with respect to an input data vector \(x \in \mathbb{C}^d\).
The manifold hypothesis critical to deep learning proposes that high-dimensional data actually lies in a low-dimensional, latent manifold within the input space. We can map the input space to the geometric moduli space of neural networks \(_d\mathcal{M}_k(\tilde{Q})\) so that our latent manifold is also translated to the moduli space. While \(_d\mathcal{M}_k(\tilde{Q})\) depends on the combinatorial structure of the neural network, activation and weight architectures of the neural network determine how data is distributed inside the moduli space.
We will approach the manifold hypothesis via framed quiver representations. A choice of a thin representation \(\tilde{\mathcal{Q}}\) of the hidden quiver \(\tilde{Q}\) and a map \(h\) from the hidden representation \(\tilde{\mathcal{Q}}\) to hidden vertices determine a pair\((\tilde{\mathcal{Q}}, h)\), where \(h = \{h_v\}{v \in \tilde{V}}\). The pair \((\tilde{\mathcal{Q}}, h)\) is used to denote our framed quiver representation.
Def: A double-framed thin quiver representation is a triple \((l, \tilde{\mathcal{Q}}, h)\) where:
\(\tilde{\mathcal{Q}}\) is a thin representation of the hidden quiver \(\tilde{Q}\)
\((\tilde{\mathcal{Q}}, h)\) is framed representation of \(\tilde{Q}\)
\((\tilde{\mathcal{Q}}, l)\) is a co-framed representation of \(\tilde{Q}\) (the dual of a framed representation)
Denote by \(_d\mathcal{R}_k(\tilde{\mathcal{Q}})\) the space of all double-framed thin quiver representations. We will use stable double-framed thin quiver representations in our construction of moduli space.
Def: A double-framed thin quiver representation \(\texttt{W}_k^f = (l, \tilde{\mathcal{Q}}, h)\) is stable if :
The only sub-representation of \(\tilde{\mathcal{Q}}\) contained in the kernel of \(h\) is the zero sub-representation
The only sub-representation of \(\tilde{\mathcal{Q}}\) contained in the image of \(l\) is \(\tilde{\mathcal{Q}}\)
Def: We present the moduli space of double-framed thin quiver representations as
The moduli space depends on the hidden quiver as well as the chosen vector spaces. Returning to neural networks \((W, f)\), and given an input data vector \(x \in \mathbb{C}^d\), we can define a map
This map takes values in the moduli space, the points of which parametrize isomorphism classes of stable double-framed thin quiver representations. Thus we have
As promised, we have mapped our input space containing our latent manifold to the moduli space \(_d\mathcal{M}_k(\tilde{Q})\) of neural networks, mathematically validating the manifold hypothesis.
Independent of the architecture, activation function, data, or task, any decision of any neural network passes through the moduli (as well as representation) space. With our latent manifold translated into the moduli space, we have an algebro-geometric way to continue to study the dynamics of neural network training.
Looking through the unsuspecting the lens of quiver representation theory has the potential to provide new insights in deep learning, where network quivers appear as a combinatorial tool for understanding neural networks and their moduli spaces. More concretely:
Continuity and differentiability of the network function \(\Psi(W, f)\) and map \(\varphi(W, f)\) should allow us to apply further algebro-geometric tools to the study of neural networks, including to our constructed moduli space \(_d\mathcal{M}_k(\tilde{Q})\).
Hidden quivers can aid us in comprehending optimization hyperparameters in deep learning. We may be able to transfer gradient descent optimization to the setting of the moduli space.
Studying training within moduli spaces can lead to the development of new convergence theorems to guide deep learning.
The dimension of \(_d\mathcal{M}_k(\tilde{Q})\) could be used to quantify the capacity of neural networks.
The manifold hypothesis has played a ubiquitous role throughout deep learning since originally posed, and formalizing its existence via the moduli of quiver representations can help us to understand and potentially improve upon the effectiveness of neural networks and their latent spaces.
Notes andAcknowledgements. Content for this post was largely borrowed from and inspired by The Representation Theory of Neural Networks, smoothing over many details more rigorously presented in the original paper. We thank the 2025 SGI organizers and sponsors for supporting the author’s first deep learning-related research experience via the “Topology Control” project as well as mentors and other research fellows involved for their diverse expertise and patience.