Categories
Research

Higher-Order Discretization of Mean Curvature Flow

Author: Ehsan Shams (Alexandria University, EG)

Project Mentor: Justin Solomon (MIT, USA)
Volunteers: Biruk Ambaw (Université Paris-Saclay, Fr), Andrew Rodriguez (Georgia Institute of Technology, USA), and Josh Vekhter ( UT Austin, USA)

Acknowledgments. Sincere thanks to Professor Justin Solomon for his invaluable guidance throughout this project. His carefully designed questions and coding tasks not only deepened my understanding of core topics in geometry processing in a short amount of time, but also sharpened my coding skills—and ensured my lunch breaks were notably shorter than they might have been otherwise :). I would also like to thank Josh Vekhter, Andrew, and Biruk for their support and valuable feedback to my teammates and me. In addition to the math and codes, Professor Justin and Josh taught me the value of a well-timed joke to lighten the load!

Introduction

Mean Curvature Flow (MCF) is a fundamental geometric evolution partial differential equation (PDE) that describes the motion of a surface \(\mathcal{M}_t \subset \mathbb{R}^3\) under its mean curvature. Each point on the surface moves in the direction of the unit normal vector to the surface with velocity proportional to the mean curvature, leading to a smoothing effect that regularizes the geometry of the surface over time. MCF is widely studied in differential geometry and geometric analysis due to its intrinsic connection to minimal surfaces and its role in shape optimization. In computational applications, MCF is particularly useful in geometry processing (GP), including tasks such as surface fairing, mesh denoising, and feature-preserving smoothing.

The discretization of the MCF PDE consists of both spatial and temporal components. Spatial discretization is well-established, with common techniques such as finite element methods, discrete exterior calculus, and discrete differential geometry effectively approximating the part of the equation that governs the surface’s spatial evolution. In contrast, temporal discretization remains challenging due to stability constraints and accuracy limitations.

Three primary approaches exist for temporal discretization: explicit, implicit, and semi-implicit methods. Explicit methods, like forward Euler, often become unstable with larger time steps. Implicit methods, such as backward Euler, offer stability but at a high computational cost. Semi-implicit methods, like the one introduced by Desbrun et al. (1999), strike a compromise between these two extremes but may still fall short in terms of accuracy and stability. In practice, the temporal discretizations that are commonly employed in the literature are only first-order accurate thus, may not provide the desired level of accuracy and stability we ideally wish for.

The central focus of this project was to derive and implement higher-order temporal discretizations—both explicit and semi-implicit—for MCF on triangular meshes, with the goal to improve the numerical accuracy of the discrete evolution of the surface by reducing truncation errors and better approximating the continuous PDE solution, particularly for larger time steps. By increasing temporal accuracy, we aim to enhance both the fidelity of the simulated flow and the computational efficiency, mitigating the need for excessively small time steps while maintaining stability.

In this article, I externalize my internal exploratory journey and insights gained during my last SGI research week under the guidance of my mentors. The article begins with an introduction to MCF and its significance in GP, followed by a brief discussion on the process of discretization followed by the derivation of spatial discretization via finite element methods and first-order accurate temporal discretization via finite difference methods for MCF. A theoretical comparison of the methods in question is then provided, highlighting their pros and cons. Finally, we derive the second-order accurate temporal discretization of MCF, in both, the explicit, and semi-implicit schemes. Empirical validations of theoretical results are also presented, along with a humorous fail. The goal here is on exposition, rather than taking the shortest path.

Understanding Mean Curvature Flow in \(\mathbb{R}^3 \)

Formal Definition

Let \(M_t \subset \mathbb{R}^3\) represent a family of smoothly embedded surfaces parameterized by time \(t\). The surface at time \(t\) can be described by a smooth mapping:

\[ X(\mathbf{u}, t) : U \times [0, T) \rightarrow \mathbb{R}^3 \]

Here:

  • \(U\) is an open set in \(\mathbb{R}^2\) representing the parameter space of the surface, with local coordinates \( \mathbf{u} = (u_1, u_2)\).
  • \([0,T) \) represents time, where \(T\) is the time until which the flow is considered.
  • \(X(\mathbf{u}, t)\) is the position vector of a point on the surface in \(\mathbb{R}^3\) at time \(t\), and can be explicitly written as: \[ X(\mathbf{u},t) = \begin{pmatrix} X_1(\mathbf{u},t) \\ X_2(\mathbf{u},t) \\ X_3(\mathbf{u},t) \end{pmatrix} \] and \(X_1(\mathbf{u},t), X_2(\mathbf{u},t), X_3(\mathbf{u},t)\) are the coordinate functions that determine the \(x, y,\) and \(z\) components of the position vector in \( \mathbb{R}^3\)

The mean curvature flow is then dictated by the following partial differential equation:

\[ \frac{\partial X}{\partial t}(\mathbf{u}, t) = – H(\mathbf{u}, t) \mathbf{n}(\mathbf{u}, t) =\Delta X(\mathbf{u}, t)\]

Where:

  • \(\frac{\partial X}{\partial t}(\mathbf{u}, t)\) is the velocity of the surface at the point \(X(\mathbf{u}, t)\).
  • \(H(\mathbf{u}, t)\) is the mean curvature at the point \(X(\mathbf{u}, t)\).
  • \(\mathbf{n}(\mathbf{u}, t)\) is the unit normal vector at the point \(X(\mathbf{u}, t)\), pointing outward or inward. It indicates the direction in which the surface will move during the flow.
  • \( \Delta \) is the Laplace-Beltrami operator associated with the surface \(M_t\).

Abuse of Notation. For simplicity, any operator \( \phi (\mathbf{u},t) \) at a point parameterized by \( (\mathbf{u},t))\) will be referred to simply as \( \phi \) in this article. The Laplace-Beltrami operator, might sometimes be referred to as the Laplacian depending on my mood.

Mean Curvature \(H\):

The mean curvature \( H\) at a point is defined as the average of the principal curvatures \( k_1 \) and \( k_2 \) at that point on the surface: \[ H= \frac{1}{2} (k_1 + k_2) \]

Where \(k_1\) and \(k_2\) are the eigenvalues of the second fundamental form of the surface. These principal curvatures measure the maximum and minimum bending of the surface in orthogonal directions at this particular point in question.

Alternative Expressions for \( H \):

  • Divergence of the Normal Vector Field: \(H\) can be expressed as the negative half of the divergence of the unit normal vector field \(\mathbf{n}\) on the surface: \( H= – \frac{1}{2} div ((\mathbf{n})) \).
  • Trace of the Shape Operator: In addition, \(H\) can be written as the trace of the shape operator (or Weingarten map) associated with the surface. This representation connects the mean curvature to the linear transformation that describes how the surface bends.

Why Study MCF?

Understanding MCF begins with a simple question: what happens when a surface evolves according to its own curvature? In practice, this flow helps smooth surfaces over time, driving them toward more regular shapes. This process is especially intriguing because it captures essential properties of geometric evolution without external forces, relying solely on the surface’s own shape.

A fundamental result by Huisken in 1984 in his paper titled “Flow by Mean Curvature of Convex Surfaces into Spheres” provides deep insight into this phenomenon. Huisken studied the evolution of a special type of hypersurfaces under MCF, and proved that any strictly convex smooth hypersurface, under MCF, evolves into a sphere before collapsing to a point in a finite, self-similar manner in a finite time. His work highlights how MCF transforms surfaces, regularizing their shape as they shrink and demonstrating the flow’s inherent tendency to round out irregularities. In mathematics, it is common practice to first test an idea or prove a theorem in simple and nice settings (such as convex compact spaces) before attempting to test/generalize to more abstract spaces.

While the general theory of MCF applies to a wide range of surfaces, it is particularly insightful to consider how a sphere—a familiar, symmetric object—shrinks under the flow. The sphere offers a clean, intuitive example where the underlying mathematics remains manageable, yet it showcases the essential features of MCF, such as shrinking to a point while maintaining symmetry. This serves as an ideal starting point for understanding more complex behaviors in less regular surfaces, grounded in the theoretical framework established by Huisken’s work.

Example: Shrinking Sphere Under MCF

Consider a sphere of radius \(R(t)\) at time \(t\). Under MCF, the sphere’s surface shrinks over time. We want to find how the volume of the sphere changes as it evolves under this flow.

For a sphere of radius \(R(t)\), the mean curvature \(H\) is given by: \( H=\frac{2}{R(t)}\), and the normal vector \(\mathbf{n}\) points radially inward, so: \(\mathbf{n}=\frac{−X}{R(t)}\)

Substituting these in the MCF equation: \( \frac{\partial X}{\partial t}=\frac{-2}{R(t)} \frac{-X}{R(t)}=\frac{2X}{R^2(t)}\)

To determine how the radius \(R(t)\) changes over time, observe that each point on the surface \(X\) can be written as: \(X=R(t) \mathbf{u}\) where \(\mathbf{u}\) is a unit vector in the direction of \(X\). Thus, \( \frac{\partial X}{\partial t} = \frac{\partial (R(t) \mathbf{u})}{\partial t} = \frac{2\mathbf{u}}{R(t)} \), which simplifies to \( \mathbf{u}\frac{dR(t)}{dt}= \frac{2 \mathbf{u}}{R(t)}\) so \(\frac{dR(t)}{dt}= \frac{2 }{R(t)} \).

To solve for \( R(t) \), we separate the variables and integrate \(\int R(t) \, dR(t) = \int 2 \, dt \) which gives \(\frac{R^2(t)}{2} = 2t + C \), where \( C \) is the constant of integration. Solving for \( R(t) \) gives \( R(t) = \sqrt{4t + C’} \) where \( C’ = 2C \). If we consider the initial condition \( R(0) = R_0 \), then \( R_0^2 = C’ \), and hence: \(R(t) = \sqrt{R_0^2 – 4t} \)

Volume Shrinkage:

The volume \( V(t) \) of the sphere at time \( t \) is \( V(t) = \frac{4}{3} \pi R^3(t) \). As the sphere shrinks over time, the volume decreases according to the following equation: \( V(t) = \frac{4}{3} \pi \left(R_0^2 – 4t \right)^{3/2} \), and it continues to shrink until \( t = \frac{R_0^2}{4} \), at which point the radius reaches zero and the sphere vanishes.

The accompanying animation illustrates the shrinking process, showing how both the radius and volume decrease over time under the influence of MCF until the sphere vanishes. The MCF drives the surface points of the sphere inward in the direction of the inward normals, leading to a uniform shrinking process. For a perfectly symmetric object like a sphere, the shrinking occurs uniformly, preserving the spherical shape until the radius reduces to zero.

The shrinking radius is computed via the derived formula above, \( R(t)= \sqrt{R_0^2 – 4t} \), and the shrinking becomes faster as time progress. The plots that are displayed at the end show the relation between \( R(t) \) and \( V(t) \) over time.

Shrinking sphere under MCF + evolution over time plots

Since computer machines operate in a discrete, digital environment, and GP is inherently computational, continuous geometric surfaces must be represented in a form that computer machines can process. This form is typically a mesh, a discrete approximation of a surface composed of vertices, edges, and faces, and given that MCF is intrinsically a continuous model described by a PDE in space and time, directly applying it to meshes necessitates the discretization of its continuous operators (the Laplacian and time derivatives) as well.

Discretizing MCF

As implied by the above paragraph, discretization is the process of transforming a continuous mathematical model \(\mathcal{M}\), which is defined over a continuous domain \(\Omega \subset \mathbb{R}^n\) and governed by continuous variables \(u(x)\) and operators \(\mathcal{L}\), into a discrete model \(\mathcal{M}_h\) suitable for computational analysis. This involves:

  • Replacing the continuous domain \(\Omega\) with a finite set of discrete points or elements \(\Omega_h = {x_i}_{i=1}^N\).
  • Approximating continuous functions \(u(x)\) by discrete counterparts \(u_h(x_i)\).
  • Substituting continuous operators \(\mathcal{L}\) with discrete analogs \(\mathcal{L}_h\).

The goal is to ensure that the discrete model \(\mathcal{M}_h\) preserves key properties such as consistency, stability, and convergence, so that \(\mathcal{M}_h\) faithfully reflects the behavior of the continuous model \(\mathcal{M}\) as the discretization is refined, i.e., as \(h \rightarrow 0\).

While many continuous models can be discretized, the accuracy and efficiency of the approximation depend on the chosen discretization technique and its implementation. Common discretization techniques include: Finite difference methods (FDM), finite element methods (FEM), finite volume methods (FVM), and spectral methods.

Discretizing the MCF model can be systematically divided into two main components: spatial discretization and temporal discretization. In this article, we employ the FEM for spatial discretization and the FDM for temporal discretization.

Spatial Discretization via FEM

Spatial discretization involves representing the continuous spatial domain of the model as a mesh and approximating differential operators—primarily the Laplacian, and also gradient and divergence as quantities at the discrete mesh in question.

Assume the surface in question is discretized into a mesh which consists of a collection of triangular elements \(\{T_i\}\) with vertices \(v_j\). The first step to approximate the Laplace-Beltrami operator is to define basis functions \(\{\phi_j\}\) associated with the vertices and defined over \(T_i\). These are often linear or piecewise linear functions, \(\phi_j\) takes the value 1 at vertex \(v_j\)​ and 0 at all other vertices. These basis functions form a set that allows any function \(X\) on the surface to be approximated as a linear combination of them \( X≈ \sum_j X_j \phi_j\), where \(\mathbf{X_j}=\mathbf{X(v_j)}\) represents the value of the position function at vertex \(v_j\).

Next, we compute the gradient of each basis function within each triangle, which will be constant over the triangle because the functions are linear. The Laplacian operator, which in continuous terms is the divergence of the gradient, is discretized by integrating the product of the gradients of the basis functions over the surface. This leads to the construction of the stiffness matrix \(\mathbf{L}\), where each entry \( \mathbf{L_{ij}}\) is derived from the inner products of the gradients of the basis functions \(\phi_i\) and \(\phi_j\) ( \( \nabla \phi_i\) and \(\nabla \phi_j\)), weighted by the cotangent of the angles opposite the edge connecting vertices \(v_i\), and \(v_j\). The diagonal entries \(\mathbf{L_{ii}}\)​ sum the contributions from all adjacent vertices.

More precisely, for two vertices \(v_i\) and \(v_j\) connected by an edge, the stiffness matrix \(\mathbf{L}\) is computed using the following integral:

\[ \mathbf{L_{ij}}=\int_{\sigma} \nabla \phi_i \cdot \nabla \phi_j dA\]
where \(\sigma\) is the surface, and \(dA\) is the area element.

For a piecewise linear basis function on a triangular mesh, those gradients in question are constant within each triangle, so the integral simplifies to the sum over the triangles \(T_k\) that include the edge connecting \(v_i\) and \(v_j\):

\[ \mathbf{L_{ij}}= \sum_{T_k \in N(v_i,v_j)} (\nabla \phi_i \cdot \nabla \phi_j ) \text{Area} (T_k) \]

Where \( N(v_i,v_j)\) denotes the set of triangles sharing the edge between \(v_i\) and \(v_j\). Now the inner product \(\nabla \phi_i \cdot \nabla \phi_j\) can be computed using the cotangent formula
\[\nabla \phi_i \cdot \nabla \phi_j = \frac{-1}{2 \text{Area}(T_k)}(\cot{\alpha_{ij}}+\cot{\beta_{ij}}) \]

Thus, the entries \( \mathbf{L_{ij}} \) are:

\[\mathbf{L_{ij}}= – \frac{1}{2} (\cot{\alpha_{ij}}+\cot{\beta_{ij}}) \]

Finally the diangonal enteries \(\mathbf{L_{ii}}\) sum the contributions from all the adjacent verticies:
\[\mathbf{L_{ii}}=\sum_{i \neq j} \mathbf{L_{ij}} = \sum_{i \neq j} \frac{1}{2} (\cot{\alpha_{ij}}+\cot{\beta_{ij}})\]

To balance the discretization, we also need the mass matrix \(\mathbf{M}\), which arises from integrating the basis functions themselves. Formally, the entries of the mass matrix \(\mathbf{M}\) are given by:
\[ \mathbf{M_{ij}} = \int_{\sigma} \phi_i \phi_j dA\]

In simple cases1, the mass matrix is typically diagonal, with each entry written as:
\[ \mathbf{M}_{ii} = \frac{1}{3} \sum_{T \ni v_i} \text{Area}(T),\]

where the sum is over all triangles \(T\) that share the vertex \((v_i).\)
The final discretized Laplace-Beltrami operator is represented by the generalized eigenvalue problem \(\mathbf{L}X = \lambda \mathbf{M} X\), where \(\mathbf{L}\) encodes the differential operator and \( \mathbf{M}\) ensures that the discretization respects the geometry of the mesh.

Both matrices (the mass matrix \( \mathbf{M}\) and the stiffness matrix \( \mathbf{L}\)) are positive semi-definite, and sparse.

Intuitively,

  • The stiffness matrix \(\mathbf{L}\) captures the geometric and differential properties of the surface. It represents how the curvature is distributed over the mesh by measuring how the gradients of the basis functions (associated with each vertex) interact with each other. The entries of this matrix are weighted by the angles in the triangles, which essentially encode how “stiff” or “resistant” the mesh is to deformation. In other words, it determines how much the shape of the surface resists change when forces (like curvature flow) are applied.
  • The mass matrix \(\mathbf{M}\) accounts for the distribution of area or “mass” over the surface. It ensures that the discretization respects the surface’s geometry by distributing weight across the vertices according to the areas of the surrounding triangles. This matrix is often diagonal, with each entry corresponding to the area associated with a vertex, making sure that the mesh’s physical properties, like area and volume, are properly balanced in computations.

Solving the general eignvalue problem, we reach the following approximation of the Laplacian: \(\Delta \approx \mathbf{M}^{-1} \mathbf{L}\).

Remark: In discretizing the Laplacian operator, several other approaches can be employed too such as divided difference, higher-order elements, discrete exterior calculus,..etc. In any case, no discretization approach of the Laplacian could keep every natural property of its ideal continuous form.

Temporal Discretization via FDM

Temporal discretization refers to the approximation of the time-dependent aspects of the MCF model (the time derivative \(\frac{\partial X}{\partial t}\)). This step is critical for evolving the surface over time according to the mean curvature dynamics. Three common approaches are used for this type of discretization: Explicit methods (e.g. Explicit Euler method), Implicit methods (e.g. Implicit Euler method), and Semi-implicit methods (e.g Desbrun et al. (1999)).

  • Explicit Euler Method: This method (also known as forward Euler method) is where the new future positions of the mesh vertices are computed based on the current positions and the mean curvature at those positions. While this method is simple to implement, it may impose stability constraints on the time step size. The discretized vertex update rule is given by:
    \[\frac{X^{(k+1)} – X^{(k)}}{\Delta t} = \mathbf{M}^{-1} \mathbf{L} X^{(k)} \]
    Rearranging this, we get: \[ X^{(k+1)} = X^{(k)} + \Delta t \mathbf{M}^{-1}\mathbf{L} X^{(k)} \] where \(X^{(k)}\) and \(X^{(k+1)}\) are surface position matrix of the mesh vertices at time steps \(k\) and \(k+1\), respectively.
  • Implicit Euler Method: Also known as backward Euler, this method involves solving a system of equations at each time step, it might provide greater stability compared to the Explicit Euler method and allow for larger time steps. The discretized vertex update rule is given by: \[ X^{(k+1)} \approx X^{(k)} + \Delta t \mathbf{M}^{-1}\mathbf{L} X^{(k+1)} \] This equation is said to be fully implicit because the Laplacian depends on the vertex positions at the time step one is trying to solve for, making it nonlinear and difficult to solve.
  • Desbrun et al.’s Semi-Implicit Method: (Desbrun et al. 1999) proposed a semi-implicit method, which is a compromise between the simplicity of explicit methods and the stability of implicit methods. The idea is to treat the Laplacian and the vertex positions in the following manner: Instead of calculating the Laplacian at the next time step (which makes the equation nonlinear), they compute it at the current time step which simplifies the problem, and the vertex positions are still updated at the next time step. Their discretized vertex update rule is given by: \[ X^{(k+1)} \approx X^{(k)} + \Delta t\mathbf{M^{-1}}(\mathbf{L}X^{(k)} )X^{(k+1)} \] This equation is still implicit in \(X^{(k+1)}\), but the Laplacian is evaluated at the known positions \(X^{(k)}\), making the system linear and easier to solve. The update rule can be re-arranged into: \[X^{(k+1)} \approx (I- \Delta t\mathbf{M^{-1}L}X^{(k)} )^{-1}X^{(k)} \]. This method offers nice stability. However, it is not as accurate as fully implicit methods, because it only approximates the Laplacian based on the current positions. It might smooth the mesh, but not as precisely as solving the full nonlinear system such as the fully-implicit. In addition, it does not generalize to other geometric flows that require more complex handling of nonlinearities.
  • Adaptive Time Stepping: This approach adjusts the time step size dynamically based on the evolution of the surface, allowing for finer resolution during critical changes and coarser resolution during smoother phases. Progyan Das from my team worked on this approach.

Remark. The derivations for the above update rules are tacitly included in the part we derive their second-order forms.

Comparative Analysis and Empirical Validation: Euler Explicit vs. Desbrun et al.’s

  1. Accuracy
    • Euler Explicit Method:
      • Order of Accuracy: This method is first-order accurate in time, meaning that the global error in the solution decreases linearly with the time step \(\Delta t\). If the exact solution at time \(t\) is denoted by \(X_{t}\) and the numerical solution by at time \(t_n\) by \(X_{t_n}\)​, the error \(E_n = X(t_n) \) satisfies\(E_n \approx C \Delta t\), where \(C\) is a constant dependent on the problem. This linear relationship implies that halving the time step approximately halves the error.
      • Error Propagation: Errors tend to accumulate more rapidly, especially for larger time steps. Because this method updates the solution based only on information from the current time step so, if the time step \(\Delta t\) is too large, the method may not accurately capture the evolution of the curvature, leading to significant errors that compound over time. The error propagation can be expressed as \(X^{(k+1)}=X^{(k)}+ \Delta t \dot F(X^{(k)})\), where \(F(X^{(k)})\) is the update function. If\(\Delta t\) is too large, the local truncation error, which is \(O(\Delta t^2)\), becomes significant, causing larger cumulative errors.
      • Handling of Complex Geometries: Will probably struggle with highly irregular meshes, which is a direct consequence of the above bullet. leading to larger errors in curvature computation.
    • Desbrun et al. Semi-Implicit Method:
      • Order of Accuracy: This method is also first-order accurate in time because it is essentially a modified backward Euler scheme, where the implicit part is handled for spatial discretization, but the time discretization remains first-order.
      • Error Propagation Reduction: The method implicitly handles the curvature of the mesh by solving a linear system at every update, which incorporates more information about the solution at the next time step. This implicit approach effectively reduces errors, and stabilizes the solution especially when larger time steps are used compared to Euler’s explicit method.
      • Numerical Diffusion: Moreover, it has a better control over numerical diffusion —a phenomenon where fine details of the mesh are smoothed out excessively—compared to the explicit method, leading to more accurate smoothing. Numerical diffusion can be mathematically described by how the curvature smoothing term affects the higher-order modes of the solution and here is where the implicit nature of the method helps preserve these modes more effectively than Euler’s explicit method.
  2. Stability
    • Euler Explicit Method:
      • Conditionally Stable: The stability here depends on the time step size; it requires small time steps to maintain stability.
      • CFL Condition: The time step must satisfy the Courant-Friedrichs-Lewy (CFL) condition, which can severely restrict the time step size, especially for fine meshes. The CFL condition constrains the time step to be inversely proportional to the square of the mesh resolution. This means that as the mesh becomes finer, the time step must decrease quadratically, which significantly increases the number of iterations required for convergence.
    • Desbrun et al. Semi-Implicit Method:
      • Unconditionally Stable: Allows larger time steps without sacrificing stability. This is a key advantage for computational efficiency.
      • Robustness: More stable under large deformations or irregular meshes, making it suitable for a broader range of applications than the explicit method.
  3. Computational Efficiency and Memory Usage
    • Euler Explicit Method:
      • Efficiency: Simpler to implement and faster per iteration due to direct updates, but requires more iterations for convergence due to the small time steps needed.
      • Memory Usage: Lower memory requirements since it does not require solving linear systems.
      • Parallelization: Easier to parallelize due to the independence of the update steps.
    • Desbrun et al. Semi-Implicit Method:
      • Efficiency: More computationally intensive per iteration due to the need to solve linear systems, but fewer iterations may be needed due to larger permissible time steps.
      • Memory Usage: Higher memory consumption due to the storage of matrices for linear system solving.
      • Parallelization: More challenging to parallelize because of the dependencies introduced by solving the linear system.
  4. Implementation Complexity
    • Euler Explicit Method:
      • Complexity: Conceptually simpler and easier to implement. It involves straightforward updates without the need for solving linear systems.
      • Dependencies: Minimal dependencies between updates, making it a more accessible method for quick implementations.
    • Desbrun et al. Semi-Implicit Method:
      • Complexity: More complex to implement due to the need to solve large, sparse linear systems at each time step.
      • Dependencies: Involves matrix assembly and inversion, which can introduce additional challenges in implementation.
  5. Parameter Sensitivity
  • Euler Explicit Method:
    • Sensitivity: Highly sensitive to time step size. Small changes can significantly affect stability and accuracy.
  • Desbrun et al. Semi-Implicit Method:
    • Sensitivity: Less sensitive to time step size, allowing for greater flexibility in choosing time steps.

Overall Assessment:

  • Euler Explicit Method is advantageous for its simplicity, ease of implementation, and parallelization potential. However, it is limited by stability constraints, accuracy issues, and higher sensitivity to parameter choices.
  • Desbrun et al. Semi-Implicit Method offers superior stability, accuracy when compared to the explicit, and reduced numerical diffusion at the cost of increased computational complexity and memory usage. It is better suited for applications requiring robust and accurate smoothing, particularly in the context of complex or irregular meshes.

Oh… this felt like eating five horrible McDonald’s cheeseburgers. 🍔🍔🍔🍔🍔 Right? So, let’s compress this previous analysis into a nice compact table for quick reference.

AspectEuler Explicit MethodDesbrun et al. Semi-Implicit Method
AccuracyFirst-order accurate in time.
Higher error accumulation, especially for large time steps.
Struggles with complex geometries.
First-order accurate in time.
Better error reduction, especially for large time steps.
Better control over numerical diffusion.
StabilityConditionally stable.
Requires small time steps, dictated by the CFL condition.
Unconditionally stable.
Allows larger time steps without sacrificing stability.
Computational EfficiencySimple and fast per iteration.
Inefficient for fine meshes due to small time step requirement.
Computationally more expensive due to solving linear systems.
Efficient for larger time steps.
Memory UsageLower memory usage.Higher memory usage due to storing and solving linear systems.
Implementation ComplexityRelatively simple to implement.More complex due to the need to solve linear systems.
ParallelizationEasier to parallelize due to independent updates.More challenging to parallelize due to solving linear systems.
Parameter SensitivitySensitive to time step size (CFL condition).Less sensitive to time step size, allowing more flexibility.
Numerical DiffusionHigher numerical diffusion, especially for large time steps.Better control over numerical diffusion, preserving more detail.
Euler explicit vs Desbrun et al. semi-implicit methods compact comparative assessment

The next experiment aims to empirically illustrate the above tradeoffs in the stability and accuracy aspects presented above for the two methods in question for MCF since the main aim of the project is to come up with an approach to improving the accuracy and stability of MCF applied to triangle meshes by incorporating higher-order derivatives in the time integration process.

Experimental Design:

  1. Mesh Preparation:
    • Target Mesh: Load a 3D mesh model, and store it as the target mesh.
    • Noisy Mesh: Add a controlled amount of noise to simulate imperfections to the target mesh and store the output as the noisy mesh.
  2. Application of MCF Methods:
    • Apply each MCF method to the noisy mesh across a series of pre-determined range of time steps. The number of iterations for both methods is fixed (10 iterations) per each time step. Choose the range of the time steps such that it includes very small (those satisfying the CFL condition) to relatively large time steps to allow for a comprehensive analysis of the methods’ behavior under different conditions.
  3. Data Logging: For each time step, record the following data for each method:
    • Error Metric: Frobenius norm of the difference between the smoothed mesh and the original target mesh.
    • Stability Metric: the number of maximum \(\Delta t\) where the error stays less than a pre-determined error threshold by the user.
    • Computational Time: Time taken for execution.
  4. Output Plots
    • Error vs. Time Steps: The error is plotted against the time step size on a log-log scale. The slope of this curve will indicate the convergence rate:
      • Steeper Slope: Indicates faster convergence and higher accuracy.
      • Flatter Slope: Suggests slower convergence and potential inaccuracies.
    • Visual and Quantitative Evaluation: The final smoothed meshes are presented for visual comparison against the target mesh.
    • Time Step Limitations: The maximum time step for which each method remains stable is identified.
Frobenius norm vs time steps size for Euler’s explicit and Desbrun et al.’s semi-implicit methods
Runtime (seconds) vs time steps size for Euler’s explicit and Desbrun et al.’s semi-implicit methods

Interpretation of Results:

  • The accuracy plot shows that the Explicit method tends to produce much higher errors compared to the Desbrun’s especially at larger time steps, confirming that it requires small time steps to maintain stability. In addition, we can see that with Desbrun’s, the error remains constant across a wide range of varying time steps, this confirms that the method’s accuracy and stability are not affected by the choice of time step within the range tested, it also verifies its robustness, and tendency to reduce error propagation as time step sizes increase. In regards to the rate of convergence, the slope of the error curve vs time-steps in Euler’s explicit is \(m=1.29\), and Desbrun’s method rate of convergence is \(m=1\).
  • In this experiment, we set the error threshold to 3. Turns out the maximum step size \(\Delta t\) where the error is below this threshold for Euler’s explicit is 2.2204e-16 while for Desbrun et al.’s is (the final in our range).

Inspiration: This experimental design was inspired by a task I did during my second SGI project on 2D Differentiable Representation of Curve Networks, under Mikhail Bessmeltsev. 🙂

The Case for Higher-Order Time Discretization

As demonstrated by both theory and practice, even robust methods like Desbrun et al.’s semi-implicit method for MCF face limitations with first-order time discretization. While this category of methods offers a compromise between explicit and fully implicit methods, first-order discretization still imposes constraints on accuracy in numerical simulations. These limitations stem from the truncation errors inherent in first-order approximations.

First-order methods approximate the time derivative using only vertex velocities (the first derivative of position with respect to time) and disregard higher-order terms, such as acceleration (the second derivative). This omission means they fail to account for how the geometry of the surface might be changing or accelerating locally since higher derivatives encode information about local curvatures. If we conceptualize the next iterate as\(X^{(k+1)}_i = \mu (X_i^{(k)}, I^{(k)})\) where \(I^{(k)}\) is an information vector, local geometric properties of the surface at the current iterate \(X_i^{(k)}\), it becomes clear that the more detailed information we incorporate into \(I^{(k)}\), the more accurate the next state \(X^{(k+1)}_i \) will be.

In other words, higher-order discretizations, such as second-order ones, lead to a significant reduction in truncation errors, better convergence, and a more accurate representation of the geometry over time for larger time steps which contributes to a more economical utilization of computational resources (e.g reduced number of iterations).

Deriving Higher-order Discretizations for MCF

In this section, we derive the second-order accurate, in time, vertex update rules for the explicit forward Euler and the semi-implicit due to Desbrun et al. Starting from the continuous form of the MCF equation, we use a Taylor expansion to approximate the position of a point on the surface up to the second-order term, and for Desbrun et al.’s, we make use of Neumann series in our derivation.

Recall that from the first section, the MCF in its continuous form, is described following equation:

\[\frac{\partial X(u,t)}{\partial t} = \Delta X(u,t)\]

where: \( X(u,t) \in \mathbb{R}^3 \) is the position of a point on the surface at parameter \( u \) and time \( t \), and \(\Delta\) is the Laplace Beltrami operator.

Now, we apply a Taylor expansion of \( X(u,t) \) around time \( t=t_k\):

\[ X(u, t_k + \Delta t) = X(u, t_k) + \Delta t \frac{\partial X(u,t)}{\partial t}| _{t=t_k}+ \frac{\Delta t^2}{2} \frac{\partial^2 X(u,t)}{\partial t^2}| _{t=t_k} + \mathcal{O}(\Delta t^3) \]

Substituting the MCF equation \( \frac{\partial X}{\partial t} = \Delta X \): \[ X(u, t_k + \Delta t) = X(u, t_k) + \Delta t \Delta X(u,t) | _{t=t_k}+ \frac{\Delta t^2}{2} \frac{\partial}{\partial t} \left( \Delta X(u,t) \right)| _{t=t_k} + \mathcal{O}(\Delta t^3) \]

Where, \[\frac{\partial^2 X(u,t)}{\partial t^2}\Bigg |_{t=t_k} = \frac{\partial}{\partial t} \left( \Delta X(u,t) \right) \Bigg |_{t=t_k}\] This follows from the definition of Laplace Beltrami operator and Schwarz’s Theorem (also known as Clairaut’s Theorem on Equality of Mixed Partials)

Now, \[ \frac{\partial}{\partial t} \left( \nabla X(u,t) \right) \Bigg |_{t=t_k} = \Delta \left( \frac{\partial X(u,t)}{\partial t} \right) \Bigg |_{t=t_k}= \Delta \left( \Delta X(u,t) \right) \Bigg |_{t=t_k} = \Delta^2 X(u,t) \Bigg |_{t=t_k} \]

By substituting this in Taylor’s expansion, we get the continuous second-order expansion for \( X(u,t)\) at \(t_k\):

\[ X(u, t_k + \Delta t) = X(u, t_k) + \Delta t \Delta X(u,t) \Bigg |_{t=t_k} + \frac{\Delta t^2}{2} \Delta^2 X(u,t) \Bigg |_{t=t_k} + \mathcal{O}(\Delta t^3) (*) \]

Now let \( X^{(k)}_i\), \( X^{(k+1)}_i \) denote the position vector of vertex \(i\) at times \( t_k \), and\( t_{k+1} \) respectively, and \( \Delta t = t_{k+1}-t_k\). Using the Taylor expansion in \((*)\), the second-order vertex update rule becomes:

\[ X^{(k+1)}_i\approx X^{(k)}_i + \Delta t \Delta X^{(k)}_i + \frac{\Delta t^2}{2} \Delta^2 X^{(k)}_i \]

Now the spatial discrete approximation we use in this article is \(\Delta \approx \mathbf{ML}^{-1}\). Now we can write the vertex update rule for Forward Euler as follows:
\[ X^{(k+1)}_i\approx X^{(k)}_i + \Delta t \mathbf{ML}^{-1} X^{(k)}_i + \frac{\Delta t^2}{2} (\mathbf{ML}^{-1})^2 X^{(k)}_i \]

After doing some algebra, we reach the following matrix form:

\[ X^{(k+1)} \approx [ I + \Delta t \mathbf{ML}^{-1} + \frac{\Delta t^2}{2} (\mathbf{ML}^{-1})^2] X^{(k)} \]

For the semi-implicit form due to Desbrun et al.’s, things are a bit tricky. It should be easy by now to derive the first-order vertex update presented earlier in (ref) using Taylor expansion which takes the matrix form:

\[X^{k+1}_i \approx (I- \Delta t\mathbf{M^{-1}L} X^{k} )^{-1} X^{k} \]

To derive the second order term for this scheme, we expand the inverse matrix \( \left(I – \Delta t \mathbf{M}^{-1} L X^{(k)} \right)^{-1} \) using a Neumann series, but for this to work, we have to ensure that \( \mathbf{M}^{-1}L X^{(k)}\) satisfies the condition for convergence, i.e., its spectral radius is strictly less than 1: \[ \rho (\Delta t \mathbf{M}^{-1}L X^{(k)} )<1 \]

This means \(\Delta t\) must be chosen small enough, or the structure of \(\mathbf{M}^{-1}L X^{(k)}\) must ensure that its eigenvalues are small.

Assuming this is true, the Neumann series expansion for \( \left(I – \Delta t \mathbf{M}^{-1}L X^{(k)} \right)^{-1} \) can be written as: \[ \left(I – \Delta t \mathbf{M}^{-1} L X^{(k)}\right)^{-1} = I + \Delta t \mathbf{M}^{-1} L X^{(k)}+ \Delta t^2 \left(\mathbf{M}^{-1} L X^{(k)}\right)^2 + \mathcal{O}(\Delta t^3)\]

Substituting this approximation into the semi-implicit update rule, we get:

\[ X^{k+1} \approx \left( I + \Delta t \mathbf{M}^{-1} L X^{(k)} + \Delta t^2 \left( \mathbf{M}^{-1} L X^{(k)} \right)^2 \right) X^{k} \qquad (***)\]

Discussion. The advantage of using Neumann series in deriving the second-order time discretization is that it allows us to approximate \( \left(I – \Delta t \mathbf{M}^{-1} L X^{(k)} \right)^{-1} \) without having to directly compute the matrix inverse, which can be computationally expensive for large meshes. Instead, the expansion provides a series of manageable terms so with that we can economically exploit the accuracy benefits attained from adding the higher-order terms. With that said, the major disadvantage here is that if \(\Delta t\) becomes too large, the Neumann series may fail to converge or lead to unstable behavior, limiting a bit its effectiveness for semi-implicit schemes over larger intervals. However, it would be not correct to say that it is impossible to circumvent the stability issue. We talk about this in a subsequent article.

An Alternative Discretization based on (Huisken’s, 1984) MCF Evolution Equations

Earlier in the article, we mentioned the landmark result of the influence of MCF on strictly convex smooth hypersurface in Euclidean spaces due to (Huisken, 1984). To establish this result, Huisken derived several key equations that rigorously describe how various geometric quantities change over time as general surfaces evolve under MCF.

  1. \( \frac{\partial n}{\partial t}=\nabla H \)
  2. \( \frac{\partial H}{\partial t}=\Delta H+|A|^2 H =\Delta H + (H^2-2K)H. \) where \(A\) is the second fundamental form, and \(K\) is the Gaussian curvature.

These are called the surface evolution equations, The first equation describes how the unit normal vector \(n\) evolves over time, linking its rate of change to the gradient of the mean curvature \(H\). The second equation tracks the evolution of \(H\) itself as the surface changes.

From these, we can derive the following equation for the second derivative of the surface position \(X\) with respect to time:

\[\frac{\partial^2 X}{\partial t^2} = \frac{\partial}{\partial t}(H\mathbf{n}) = \frac{\partial H}{\partial t}\mathbf{n} + H\frac{\partial \mathbf{n}}{\partial t}= (\Delta H +(H^2-2K)H)\mathbf{n}+H\nabla H\]

This expression consists of geometric quantities that can be approximated on a mesh—though they tend to be noisy. We can also write the equation in an alternative form:

\[\frac{\partial^2 X}{\partial t^2}=(\Delta H)\mathbf{n} + (H^2-2K)\Delta x + H\nabla H\]

Why is this important here? Since we are discussing higher-order discretizations of MCF, we are interested in discovering new equivalent (and hopefully economical) ways to describe the temporal derivatives in question. (Huisken, 1984) provides some, and thus a natural question arises: can we discretize the components, \(\mathbf{n}, H, \Delta H, \nabla H, K\) of \( \frac{\partial^2 X}{\partial t^2}\) and \(\frac{\partial X}{\partial t}\) ​ to derive a second-order discretization for MCF using Taylor series? The answer is yes.

For example, the Gaussian curvature \(K\) can be discretized using the angle deficit method. The normal vector \(\mathbf{n}\) at a vertex can be estimated as the area-weighted average of the normals of the adjacent triangles. The mean curvature \(H\) as \(\mathbf{M^{-1}L}\) applied to the verticies of the mesh. The gradient \(\nabla H\) can be approximated using finite differences or based on the stiffness matrix \(mathbf{L}\) and adjacent vertex data, while \(\Delta H\) can be discretized using \(\mathbf{L}\) applied to the discrete mean curvature \(H\).

Using these discretized quantities, we arrive at the following vertex update formula: \[ X_i^{(k+1)} \approx X_i^{(k)}+\Delta t H_i \mathbf{n}_i + \frac{\Delta t^2}{2} ( (\mathbf{L} H_i) \mathbf{n}_i + (H_i^2-2K_i)\Delta X_i^{(k)} + H_i\nabla H_i)) \] However, as mentioned this discretization approach is often not preferred due to the significant noise in the quantities \(\mathbf{n}, H, \Delta H, \nabla H\), and \(K\) on meshes.

Visualizing \(n, H, \Delta H, \nabla H,\) and \(K\) on a Mesh

Visualizing mean curvature, and normals per vertices on a sphere mesh (r=0.5), with color mapping based on mean curvature.
Gradient of mean curvature H

The next experiment aims to visualize higher-order effects in MCF by plotting small arcs at each mesh vertex. These arcs are defined as

\(f(h) = X_i + (Hn) \big |_{X=X_i} \cdot h + \frac{1}{2} (\frac{\partial}{\partial t}(Hn))\big |_{X=X_i}= \frac{\partial H}{\partial t}n \big |_{X=X_i}+ H\frac{\partial n}{\partial t}\big |_{X=X_i}= (\Delta H +(H^2-2K)H)n \big |_{X=X_i}+H\nabla H) \big |_{X=X_i} \cdot h^2 \)

where \(\frac{\partial X}{\partial t}\) is the mean curvature normal and \(\frac{\partial^2X}{\partial t^2}\) is a second-order term from Huisken’s calculations.

The idea of this experiment was suggested by Prof. Justin Solomon on day two of the project!

The first-order term moves the surface in the direction of the normal, scaled by the mean curvature. This means regions with higher curvature see greater movement compared to flatter regions. The second-order term refines this by adding curvature-dependent corrections. It can enhance or counteract the displacement done via the first-order term, affecting the arc’s bending and potentially leading to different geometric changes. In addition, the second-order term can indeed add accuracy to the displacement, providing a more precise description of surface evolution. However, higher-order terms are also more sensitive to mesh noise and discretization errors, which can introduce potential instabilities or oscillations, particularly in regions with poor mesh quality and such instabilities can be amplified in regions with high curvature, where numerical errors from the second-order term might dominate. In our case, these instabilities are reflected in exaggerated displacements, resulting in disproportionately large polylines at certain vertices. With more trivial meshes, this instability problem will not be as amplified as it is the case with complex meshes.

Implementation of Second-Order Semi-implicit (Desbrun et al.’s, 1999):

The following are the output results of our second-order Desbrun et al’s semi-implicit method (\(***\)).

An animation showing the evolution of the surface under second-order Desbrun et. al’s MCF method, 50 iterations.

Yay or Nay: Circular Arc-Based Discretizations for Curvature-Driven Flows

In the Taylor expansion used to derive the vertex-update rule for MCF, the position \(X(u,t)\) of each vertex is typically approximated by a quadratic polynomial in time:

\[ X(u, t_k + \Delta t) = X(u, t_k) + \Delta t \frac{\partial X(u,t)}{\partial t}| _{t=t_k}+ \frac{\Delta t^2}{2} \frac{\partial^2 X(u,t)}{\partial t^2}| _{t=t_k} + \mathcal{O}(\Delta t^3) \]

where \( \frac{\partial X}{\partial t} = \Delta X \), the Laplacian of the position, is the driving term in MCF, and \( \frac{\partial^2 X}{\partial t^2} \) is obtained from differentiating this expression again. While this quadratic approximation is computationally straightforward and effective for small time steps, it does not inherently capture the geometric structure of the flow. 

For curvature-driven flows, such as the evolution of a sphere under MCF, where the curvature \( H \) is constant at each point, circular arcs may provide a more natural approximation. Circular arcs reflect the constant curvature evolution by following a trajectory where the velocity of each vertex aligns with the normal direction, and the path of the vertex forms part of a circle. This would involve approximating the update as:

\[X(u, t_k + \Delta t) \approx X(u, t_k) + r(\cos(\theta) – 1) \mathbf{n},\]

where \( r \) is the radius of curvature and \( \theta \) is the angle swept by the vertex in time \( \Delta t \), with \( \mathbf{n} \) being the surface normal. While circular arcs introduce more computational complexity, they better approximate the geometric behavior of curvature-dominated flows and may lead to improved accuracy and stability in such cases.

Key Takeaway:

  1. MCF is important in GP!
  2. Lots of discretization approaches for the Laplacian exist, none of them could keep every natural property of its ideal continuous form. You choose what is suitable for your problem, and application.
  3. Coming up with new equivalent formulations for MC \(H\), and the rate of change of the position vector-valued function \(X\) of points on the surface would open more doors for finding new economical discretizations

Future work: Will venture more into the math of MCF, focusing specifically on points 2 and 3 from the Key Takeaways. Additionally, I explore some tangentials in regards to higher-order integrators for MCF, and other geometric flows.


A Humorous Fail:

On the second day while coding the first-order discretizations, I forgot to include the mass matrix \(\mathbf{M}\) which resulted in a smoothed horribly deformed bear. This demonstrates the critical role of \(\mathbf{M}\) in ensuring that the discretization respects the surface’s geometry by appropriately distributing weight across the vertices according to the areas of the surrounding triangles.

  1. When the basis functions used are piecewise linear and the mesh structure is uniform ↩︎

Bibiliography:

  • Justin Solomon (Director). (2013, May 8). Lecture 12: Finite Elements and the Laplacian [Video recording]. https://www.youtube.com/watch?v=7_xDIg-pOC4
  • Huisken, G. (1984). Flow by mean curvature of convex surfaces into spheres. Journal of Differential Geometry20(1), 237-266.
  • Desbrun, M., Meyer, M., Schröder, P., & Barr, A. H. (1999, July). Implicit fairing of irregular meshes using diffusion and curvature flow. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques (pp. 317-324).
  • Patanè, G. (2017). An introduction to Laplacian spectral distances and kernels: Theory, computation, and applications. In ACM SIGGRAPH 2017 Courses (pp. 1-54).
  • Hughes, T. J. R. (2000). The finite element method: Linear static and dynamic finite element analysis. Dover Publications.
  • Evans, L. C. (2010). Partial differential equations. American Mathematical Society.

Categories
Math

Part I: Manifolds—Exploring Differential and Discrete Geometry Perspectives

ABSTRACT

In this three-part series, we rigorously explore the concept of manifolds through the perspectives of both differential geometry and discrete differential geometry. In Part I, we establish the formal definition of a manifold as a special type of topological space and present illustrative examples. In Part II, we introduce the additional structure needed to define differentiable manifolds. Finally, part III presents the discretization of manifolds within the framework of discrete differential geometry, where we approximate smooth manifolds using simplicial complexes or polygonal meshes. Looking at the concept from both perspectives is an opportunity to gain a deeper insight into both types of geometries. The series is nearly self-contained, requiring only a basic understanding of naive set theory and elementary calculus from the reader.

Introduction

A manifold is a special kind of topological space, so special, in fact, that mathematicians have given it its own name. The term “manifold” traces back to the Old English manigfeald and Proto-Germanic maniġfaldaz, meaning “many folds” or “layers.” This etymology descriptively captures the essence of what a manifold represents: a space with many dimensions or complexities, yet with a coherent structure. To define a manifold formally, we first introduce the concept of a general topological space. Only after this, we can talk about the specific properties that a topological space must have to be considered a manifold.

Topological Spaces

Definition. Let \( M \) be a set. Then a choice \( \mathcal{O} \subseteq \mathcal{P}(M) \) is called a topology on \( M \) if:

  1. \( \emptyset \in \mathcal{O} \) and \( M \in \mathcal{O} \);
  2. For \(\{U_i\}_{i=1}^n \subseteq \mathcal{O}\) \( \Rightarrow \bigcap \{U_i \}_{i=1}^n \in \mathcal{O}\)
  3. For any arbitrary collection of sets \( \mathcal{C} \subseteq \mathcal{O}\) \( \Rightarrow \bigcup \mathcal{C} \in \mathcal{O}\)

And the pair \( (M, \mathcal{O}) \) is called a topological space.

Abuse of Notation. In this note, sometimes we abbreviate \(M, \mathcal{O}\) by just \(M\), leaving the topology \( \mathcal{O}\)
implicit.

In mathematics, a topology on a set provides the weakest structure needed to define the two very important notions of convergence of sequences to points in a set, and of continuity of maps between two sets. Unless \( |M|=1 \). There are many different topologies one could establish on a set on the same set. Depending on what topology you have on \(M\), the notion of continuity and convergence changes accordingly.

The following table shows us how many different topologies one can establish on a set based on its cardinality.

\( |M| \)Number of Topologies
11
24
329
4355
56,942
6209,527
79,535,241

Examples of Topologies

  1. Chaotic (trivial) topology: For the set \( M = \{a, b, c\} \), the chaotic topology includes only the entire set and the empty set: \( \mathcal{O} = \{\emptyset, M\} \). This topology is called “chaotic” because it has the least structure, and can be defined on any set.
  2. Discrete Topology: For the set \( M = \{a, b, c\} \), the discrete topology includes every possible subset of \( M \): \( \mathcal{O} = \{\emptyset, \{a\}, \{b\}, \{c\}, \{a, b\}, \{a, c\},\{b, c\}, \{a, b, c\}\}\). This topology provides the most structure, and can be defined on any set.
  3. Standard Topology on \( \mathbb{R} \) (Open Interval Topology): For the set \( M = \mathbb{R} \) (the real numbers), the standard topology is generated by open intervals \( (a, b) \) where \( a, b \in \mathbb{R} \) and \( a < b \): \( \mathcal{O} = \{U \subseteq \mathbb{R} \mid U \text{ is a union of open intervals } (a, b)\}\)

Just as sets are distinguished from each other based on one important property—the cardinality of sets—in set theory, we can define properties that help distinguish one topological space from another. There are many such topological properties for this purpose. We will present those needed to distinguish a topological space that is a manifold from one that is not, namely, the separation, compactness, and paracompactness properties.

Separation, Compactness, and Paracompactness of Topological Spaces

Separation Properties:

Separation properties are used to distinguish points and sets within a topological space, providing a way to understand how “separate” or “distinct” different points or subsets are. To illustrate, consider
\(M = \{a, b, c\}\) and the topology \( \mathcal{O} = \{\phi, \{a, b, c\}\}\). This topology is fairly “blind to its element”: it can not tell apart any of the points \(a, b, c\)! But any metric space can tell its points apart
(because \(d(x, y) > 0 \) when \( x \neq y\)). While we focus on one specific type of separation property—the \(T_2\) Hausdorff property—there are many other separation properties (many \(Ts\)), some stronger while others are weaker than \(T_2\), that also play important roles in topology.

Definition: A topological space \( (M, \mathcal{O}) \) is called a Hausdorff space (or \(T_2\) space) if for any two distinct points \( p, q \in O \), there exist disjoint open neighborhoods \( U \) and \( V \) such that \( p \in U \) and \( q \in V \). That is, the space satisfies the following condition:

For any \( p, q \in \mathcal{O}\) with \(p \neq q,\) there exist disjoint open sets \(U\) and \(V\) such that \(p \in U\) and \(q \in V. \)

Example: Consider the topological space \( (\mathbb{R}^2, \mathcal{O}) \), where \( \mathcal{O} \) is the standard topology on \( \mathbb{R}^2 \). This space is \(T_2\) Hausdorff. The standard topology \( \mathcal{O} \) on \( \mathbb{R}^2 \) is the collection of all unions of open balls.

An open ball centered at a point \( (x_0, y_0) \) with radius \( r > 0 \) is:
\( B((x_0, y_0), r) = \{ (x, y) \in \mathbb{R}^2 \mid \sqrt{(x – x_0)^2 + (y – y_0)^2} < r \} \)

And indeed, \( \mathbb{R}^2 \) has the\(T_2\) (Hausdorff) property since given any two distinct points in \( \mathbb{R}^2 \), you can always find two open balls that do not overlap.

More generally, the topological space \( (\mathbb{R}^d, \mathcal{O} )\) is \(T_2\) Hausdorff where \( \mathcal{O} \) is its standard topology.

Compactness, and Paracompactness:

Definition. Let \( (M, \mathcal{O}) \) be a topological space. An open cover of \( M \) is an arbitrary collection of open sets \( \{ U_{\alpha \in A} \}\) from \( \mathcal{O}\) (possibly infinite or finite) such that: \[ M = \bigcup_{\alpha \in A} U_{\alpha} \]

A subcover is exactly what it sounds like: it takes only some of the \(U_{\alpha \in A}\), while ensuring that \(M\) remains covered.

Definition. A topological space ( \(M, \mathcal{O}\) ) is called compact if every open cover of \( M \) has a finite subcover (i.e. there exists \( F \subset A \) such that: \(M = \bigcup_{\alpha \in F} U_{\alpha}\) where \(F\) is finite).

Compactness is a property that generalizes the notion of closed and bounded sets in Euclidean space. A topological space is compact if every open cover of the space has a finite subcover. This means that, no matter how the space is covered by open sets, it is possible to select a finite number of those sets that still cover the entire space. Compact spaces have several important properties.

In many mathematical contexts, when developing and proving new theorems within the framework of topological spaces, it is common to first address the case where the space is compact. Once the theorem/proof is established for compact spaces, efforts are then made to extend the result to non-compact spaces. Sometimes it is not possible to do the extension. On the other hand, paracompactness is a generalization of compactness (i.e, a much weaker notion) and rarely is it the case to find a topological space that is not paracompact.

Paracompactness:

Definition. A topological space \( (M, \mathcal{O}) \) is called paracompact if every open cover has an open refinement that is locally finite.

Given an open cover \( \{ U_{\alpha \in A} \}\) of \(M\), an open refinement \( { V_{\beta} }_{\beta \in B} \) of this cover is another open cover where every \( V_{\beta} \) is contained in some \( U_{\alpha} \) (i.e. \(\{ V_{\beta} \}_{\beta \in B}\) is a refinement if \( V_{\beta} \subset U_{\alpha} \text{ for some } \alpha \in A.\))

In other words, \( { V_{\beta} }_{\beta \in B} \) is a finer cover than \( \{ U_{\alpha \in A} \}\), meaning that each open set in the refinement is more “localized” or “smaller” in some sense compared to the original cover.

Definition. The refinement is said to be locally finite if every point in \( M \) has a neighborhood that intersects only finitely many of the sets \( V_{\beta} \).

This means that around any given point, only a finite number of the open sets in the cover are “active” or have non-empty intersections with the neighborhood.

In summary: Compactness ensures that any cover can be reduced to a finite cover, while paracompactness ensures that any cover can be refined to a locally finite cover. Compactness deals with the ability to reduce the size of a cover, while paracompactness deals with the ability to organize the cover more effectively without too much local overlap.

Now, we are ready to lay down the formal definition of a manifold!

Manifolds

Definition: A paracompact, Hausdorff topological space ( \(M, \mathcal{O}\) ) is called a (d)-dimensional manifold if for every point \( p \in M \), there exists a neighborhood \( U(p) \) of \( p \) and a homeomorphism \( \varphi: U(p) \to \varphi(U(p) ) \subset \mathbb{R}^d \). In this case, we also write dim \(M\)= \( d \).

What are homeomorphisms? Homeomorphism (Homeos) are structure-preserving maps between topological spaces. Formally, we say that a map \( \varphi: (M, \mathcal{O}_M) \to (N, \mathcal{O}_N) \) is called a homeomorphism if it satisfies the following conditions:

  • \( \varphi: (M, \mathcal{O}_M) \to (N, \mathcal{O}_N) \) is a bijection
  • \( \varphi: (M, \mathcal{O}_M) \to (N, \mathcal{O}_N) \) is continuous
  • The inverse map \( \varphi^{-1}: (N, \mathcal{O}_N) \to (M, \mathcal{O}_M) \) is also continuous.

This definition tells us that a d-manifold is a special type of a topological space where we can distinguish between its subspaces, and it gives us two equivalent ways to think about it:

  • Locally: for any arbitrary point \(p \in M\), you can always find an open set that contains it and this open set can be mapped by some homeo to a subset of \(\mathbb{R}^d\). For example, to someone standing on the surface of the Earth, the Earth looks much like \(\mathbb{R}^2\).
  • Globally: there exists an open cover \( \{ U_{\alpha \in A} \}\) (possibly infinite) of \(M\) such that every \(U_{\alpha}\) is mapped by some homeo to a subset of \(\mathbb{R}^d\). For example, from outer space, the Earth can be covered by two hemispherical pancakes.

Examples of Manifolds

  1. The sphere \(S^2\) is a 2-manifold: every point in the sphere has a small open neighborhood that looks like a subset of \(\mathbb{R}^2\). One can cover the Earth with just two hemispheres, and each hemisphere is homeomorphic to a disk in \(\mathbb{R}^2\).
  2. The circle \(S^1\) is a 1-manifold; every point has an open neighborhood that looks like an open interval.
  3. The torus \(T^2\), and Klein bottle are 2-manifold too.

A non-example of a topological space that is not a manifold is the \(n\)-dimensional disk \(D^n\), because it has a boundary; points on the boundary do not have open neighborhoods that can be mapped by some homeo to a subset of \(\mathbb{R}^n\).

Definition. The closed n-dimensional disk, denoted by \( D^n \), is defined as the set of all points \( \mathbf{x} \in \mathbb{R}^n \) such that the Euclidean norm of \( \mathbf{x} \) is less than or equal to 1. Formally,
\[ D^n = \{ \mathbf{x} \in \mathbb{R}^n \mid |\mathbf{x}| \leq 1 \} \]
where \( |\mathbf{x}| = \sqrt{x_1^2 + x_2^2 + \dots + x_n^2} \) is the Euclidean norm of the vector \( \mathbf{x} = (x_1, x_2, \dots, x_n) \).

Additional terminology: Atlases and Charts

The Terminology of a Chart on a \(d\)manifold:

Let \(M\) be a \(d\)-manifold then, a chart on \(M\) is a pair \( (U, \varphi) \), where:

  • \( U \) is an open subset of \( M \).
  • \( \varphi: U \to \varphi(U) \subset \mathbb{R}^d \) (often called the coordinate map or coordinate chart) is a homeomorphism.

The component functions of \( \varphi: U \to \varphi(U) \subset \mathbb{R}^d \) are the mappings:

\[\varphi^{i}: U \to \mathbb{R}\]
\[p \mapsto proj_i(\varphi(p))\]

For \(1 \leq i \leq d\), where \(proj_i(\varphi(p))\) is the \(i\)-th component of \( \varphi (p) \in \mathbb{R}^d\).

This means that the map \( \varphi \) takes every point \(p\) in \( U \) and assigns it coordinates \(proj_i(\varphi(p))\) in \( \mathbb{R}^d = \mathbb{R}\times \mathbb{R} \times \dots \times \mathbb{R}\) ) ( \(d\) times) with respect to the chart \((U, \varphi) \).

Remarks.

  1. Notice that the paragraph above does not introduce any new information beyond what is contained in the definition of a \(d\)-topological manifold. This is why a “chart” is more of a terminology than a definition—though it is a useful one.
  2. We can see by now that there can exist a set \( \mathscr{A}\) of charts for each open set in the open cover \( \{ U_{\alpha \in A} \}\) of \(M\), and there will be many charts that overlap because Different charts may be needed to cover the entire manifold because a single chart might not be able to cover the entire surface of a sphere without singularities or overlaps.

Definition. An atlas of a manifold \( M \) is a collection \( \mathscr{A} := \{(U_\alpha, \varphi_\alpha) \mid \alpha \in A\} \) of charts such that:\[\bigcup_{\alpha \in A} U_\alpha = M.\]

Well, where do you think the words “chart” and “atlas” come from? 🙂

So what happens then if charts overlap? A natural map called the transition map displays itself naturally and is always continuous as a result of the original definition of the topological \(d\)-manifold.

Definition. Two charts \((U_1, \varphi_1)\) and \((U_2, \varphi_2)\) are called \(C^0\)-compatible if either:

  1. \(U_1 \bigcap U_2 \neq \phi \)
  2. \(U_1 \bigcap U_2 = \phi \): the (transition) map \( \varphi_2 \circ \varphi_1^{-1} : \varphi_1(U_1 \bigcap U_2) \to \varphi_2(U_1 \bigcap U_2)\) is continuous

By definition, one can go from \(U_1\) into \(\varphi_1 (U_1) \subseteq \mathbb{R}^d\), and similarly one can go from \(U_2\) into \(\varphi_2 (U_2) \subseteq \mathbb{R}^d\). For all the points in the \( U_1 \cap U_2 \), one could use either apply \(\varphi_1\) or \( \varphi_2 \) to land in the subsets \( \varphi_1 (U_1 \cap U_2) \) or \( \varphi_2 (U_1 \cap U_2) \) of \( \mathbb{R}^d \). All of a sudden, we constructed a map that goes from \( \varphi_2 \circ \varphi_1^{-1}: \mathbb{R}^d \to \mathbb{R}^d\) and this map is always continuous

This definition seems redundant and this is true, it applies to every pair of charts. However, it is just a “warm up” since we will later refine this definition and define the differentiability of maps on a manifold in terms of \(C^k\)-compatibility of charts.

Example. Consider a 2-dimensional manifold ( M ), such as the surface of a globe (a sphere). One chart ( (U_1, \varphi_1) ) might cover the Northern Hemisphere, with ( \varphi_1 ) assigning each point in ( U_1 ) latitude and longitude coordinates. Another chart ( (U_2, \varphi_2) ) might cover the Southern Hemisphere. In the overlap ( U_1 \cap U_2 ), the transition map ( \varphi_2 \circ \varphi_1^{-1} ) converts coordinates from the Northern Hemisphere chart to the Southern Hemisphere chart.

Remark. The structure of a topological \(d\)-manifold \(M\) allows us to distinguish subspaces (sub-manifolds) from each other and provides the framework to discuss the continuity of functions defined on \(M\). For example, if you have a curve \( c: \mathbb{R} \to M\) on the manifold, a function \( \mu: \mathbb{R} \to M \) or even a map \(\phi: M \to M\) you can talk about the continuity of \(c\), \(\mu\) and \(\phi\). However, the topological structure alone is not sufficient to discuss their differentiability. To do so, we need to impose an additional structure on \(M\), such as a smooth structure, to define and talk about differentiability.

In part II, we will talk more about Differentiable Manifolds.

Bibliography:

Frederic Schuller (Director). (2015, September 22). Topological manifolds and manifold bundles- Lec 06—Frederic Schuller [Video recording]. https://www.youtube.com/watch?v=uGEV0Wk0eIk

Ananthakrishna, G., Conway, A., Ergen, E., Floris, R., Galvin, D., Hobohm, C., Kirby, R., Kister, J., Kosanović, D., Kremer, C., Lippert, F., Merz, A., Mezher, F., Niu, W., Nonino, I., Powell, M., Ray, A., Ruppik, B. M., & Santoro, D. (n.d.). Topological Manifolds.

Munkres, J. R. (2000). Topology (2nd ed.). Pearson.

Categories
Uncategorized

The Origins of ‘Bug’ and ‘Debugging’ in Computing and Engineering

By Ehsan Shams

During these six fascinating weeks at SGI, I have said and heard the words “bug” and “debugging” so often that if I had an inch for every time, I could go from Egypt to Tahiti and back a few dozen times. Haha! As an etymology enthusiast, I couldn’t resist digging into the origins of these terms. Here’s what I discovered.

The use of the term “bug” in engineering and technology contexts dates back to at least the 19th century. The term was used by Thomas Edison in a letter written in 1878 to Tivadar Puskás here, in which Edison referred to minor engineering flaws or difficulties as “bugs,” implying that these small issues were natural in the process of invention and experimentation.

Specifically, Edison wrote about the challenges of refining his inventions in this letter, stating:

“It has been just so in all of my inventions. The first step is an intuition—and comes with a burst, then difficulties arise—this thing gives out and [it is] then that ‘bugs’—as such little faults and difficulties are called—show themselves, and months of intense watching, study and labor are requisite before commercial success—or failure—is certainly reached.”

Thomas Edison and His ‘Insomnia Squad’: Debugging Inventions All Night and Napping Through the Day. Photo: Bettmann/Getty Images

The term gained further prominence in the mid-20th century, especially in computing when engineers working on the Mark II computer at Harvard University in 1947, and found that the computer was malfunctioning. Upon investigation, Grace Hopper discovered that a moth had become trapped in one of the machine’s relays (Relay #70 on Panel “F” of the Harvard Mark II Aiken Relay Calculator), causing a problem, and took a photo of it and documented it.

Harvard Mark II Aiken Relay Calculator

The engineers removed the moth and taped it into the log book with the note: “First actual case of a bug being found.” While the term “bug” had already been in use to describe technical issues or glitches, this incident provided a literal example of the term, and it is often credited with popularizing the concept of “de-bugging” in computer science.

The logbook, complete with the taped moth, is preserved in the Smithsonian National Museum of American History as a notable piece of computing history. This incident symbolically linked the term “bug” with computer errors in popular culture and the history of technology.

Turns out that the term “bug” has a rich etymology, originating in Middle English where it referred to a ghost or hobgoblin that troubled and scared people. By the 16th century, “bug” started to be used to describe insects, particularly those that were seen as pests, such as bedbugs which cause discomfort, then in the 19th century, engineers adopted “bug” as a metaphor for small flaws or defects in machinery, likening these issues to tiny pests that could disrupt a system’s operation and cause discomfot.

It’s hilarious how the word “bug” still sends shivers down our spines—back then, it was ghosts and goblins, and now it could be a missing bracket hiding somewhere in a hundred-something-line program, a variable you forgot to declare, or an infinite loop that makes your code run forever!

References:

  1. Edison, T. (1878). Letter to Tivadar Puskás. The Thomas Edison Papers. Rutgers University. https://edisondigital.rutgers.edu/document/D7821ZAO#?xywh=-2%2C-183%2C940%2C1090
  2. National Geographic Education. (n.d.). World’s first computer bug. Retrieved August 18, 2024, from https://education.nationalgeographic.org/resource/worlds-first-computer-bug
  3. Hopper, G. M. (1987). The Mark II computer and the first ‘bug’. Proceedings of the ACM History of Programming Languages Conference.


Categories
Math

The Area of a Circle

Author: Ehsan Shams (Alexandria University, Egypt)

“Since people have tried to prove obvious propositions,
they have discovered that many of them are false.”
Bertrand Russell

Archimedes proposed an elegant argument indicating that the area of a circle (disk) of radius \(r\) is \( \pi r^2 \). This argument has gained attention recently and is often presented as a “proof” of the formula. But is it truly a proof?

The argument is as follows: Divide the circle into \(2^n\) congruent wedges, and re-arrange them into a strip, for any \( n\) the total sum of the wedges (the area of the strip) is equal to the area of the circle. The top and bottom have arc length \( \pi r \). In the limit the strip becomes a rectangle, meaning as \( n \to \infty \), the wavy strip becomes a rectangle. This rectangle has area \( \pi r^2 \), so the area of the circle is also \( \pi r^2 \).

These images are taken from [1].

To consider this argument a proof, several things need to be shown:

  1. The circle can be evenly divided into such wedges.
  2. The number \( \pi \) is well-defined.
  3. The notion of “area” of this specific subset of \( \mathbb{R}^2 \) -the circle – is well-defined, and it has this “subdivision” property used in the argument. This is not trivial at all; a whole field of mathematics called “Measure Theory” was founded in the early twentieth century to provide a formal framework for defining and understanding the concept of areas/volumes, their conditions for existence, their properties, and their generalizations in abstract spaces.
  4. The limiting operations must be made precise.
  5. The notion of area and arc length is preserved under the limiting operations of 4.

Archimedes’ elegant argument can be rigorised but, it will take some of work and a very long sheet of paper to do so.

Just to provide some insight into the difficulty of making satisfactory precise sense of seemingly obvious and simple-to-grasp geometric notions and intuitive operations which are sometimes taken for granted, let us briefly inspect each element from the above.

Defining \( \pi \):

In school, we were taught that the definition of \( \pi \) is the ratio of the circumference of a circle to its diameter. For this definition to be valid, it must be shown that the ratio is always the same for all circles, which is
not immediately obvious; in fact, this does not hold in Non-Euclidean geometry.

A tighter definition goes like this: the number \( \pi \) is the circumference of a circle of diameter 1. Yet, this still leaves some ambiguity because terms like “circumference,” “diameter,” and “circle” need clear definitions, so here is a more precise version: the number \( \pi \) is half the circumference of the circle \( \{ (x,y) | x^2 + y^2 =1 \} \subseteq \mathbb{R}^2 \). This definition is clearer, but it still assumes we understand what “circumference” means.

From calculus, we know that the circumference of a circle can be defined as a special case of arc length. Arc length itself is defined as the supremum of a set of sums, and in nice cases, it can be exactly computed using definite integrals. Despite this, it is not immediately clear that a circle has a finite circumference.

Another limiting ancient argument that is used to estimate and define the circumference of a circle and hence calculating terms of \( \pi \) to some desired accuracy since Archimedes was by using inscribed and circumscribed regular polygons. But still, making a precise sense of the circumference of a circle, and hence the number \( \pi \), is a quite subtle matter.

The Limiting Operation:

In Archimedes’ argument, the limiting operation can be formalized by regarding the bottom of the wavy strip (curve) as the graph of a function \(f_n \), and the limiting curve as the graph of a constant function \(f\). Then \( f_n \to f \) uniformly.

The Notion of Area:

The whole of Euclidean Geometry deals with the notions of “areas” and “volumes” for arbitrary objects in \( \mathbb{R}^2 \) and \( \mathbb{R}^3 \) as if they are inherently defined for such objects and merely need to be calculated. The calculation was done by simply cutting the object into finite simple parts and then rearranging them by performing some rigid motion like rotation or translation and then reassembling those parts to form a simpler body which we already know how to compute. This type of Geometry relies on three hidden assumptions:

  • Every object has a well-defined area or volume.
  • The area or volume of the whole is equal to the sum of the areas or volumes of its parts.
  • The area or volume is preserved under such re-arrangments.

This is not automatically true for arbitrary objects; for example consider the Banach-Tarski Paradox. Therefore, proving the existence of a well-defined notion of area for the specific subset describing the circle, and that it is preserved under the subdivision, rearrangement, and the limiting operation considered, is crucial for validating Archimedes’ argument as a full proof of the area formula. Measure Theory addresses these issues by providing a rigorous framework for defining and understanding areas and volumes. Thus, showing 1,3, 4, and the preservation of the area under the limiting operation requires some effort but is achievable through Measure Theory.

Arc Length under Uniform Limits:

The other part of number 5 is slightly tricky because the arc length is not generally preserved under uniform limits. To illustrate this, consider the staircase curve approximation of the diagonal of a unit square in \( \mathbb{R}^2 \). Even though as the step curves of the staircase get finer and they converge uniformly to the diagonal, their total arc length converges to 2, not to \( \sqrt{2} \). This example demonstrates that arc length, as a function, is not continuous with respect to uniform convergence. However, arc length is preserved under uniform limits if the derivatives of the functions converge uniformly as well. In such cases, uniform convergence of derivatives ensures that the arc length is also preserved in the limit. Is this provable in Archimedes argument? yes with some work.

Moral of the Story:

There is no “obvious” in mathematics. It is important to prove mathematical statements using strict logical arguments from agreed-upon axioms without hidden assumptions, no matter how “intuitively obvious” they seem to us.

“The kind of knowledge which is supported only by
observations and is not yet proved must be carefully
distinguished from the truth; it is gained by induction, as
we usually say. Yet we have seen cases in which mere
induction led to error. Therefore, we should take great
care not to accept as true such properties of the numbers
which we have discovered by observation and which are
supported by induction alone. Indeed, we should use such
a discovery as an opportunity to investigate more exactly
the properties discovered and to prove or disprove them;
in both cases we may learn something useful.”
L. Euler

“Being a mathematician means that you don’t take ‘obvious’
things for granted but try to reason. Very often you will be
surprised that the most obvious answer is actually wrong.”
–Evgeny Evgenievich

Bibliography

  1. Strogatz, S. (1270414824). Take It to the Limit. Opinionator. https://archive.nytimes.com/opinionator.blogs.nytimes.com/2010/04/04/take-it-to-the-limit/
  2. Tao, T. (2011). An introduction to measure theory. Graduate Studies in Mathematics. https://doi.org/10.1090/gsm/126
  3. Pogorelov, A. (1987). Geometry. MIR Publishers.
  4. Blackadar, B. (2020). Real Analysis Notes
Categories
Math Research

Spline Construction with Closed-form Arc-Length Solution: A Guided Tour

Author: Ehsan Shams (Alexandria Univerity, Egypt)
Project Mentor: Sofia Di Toro Wyetzner (Stanford University, USA)
Volunteer Teaching Assistant: Shanthika Naik (CVIT, IIIT Hyderabad, India)

In my first SGI research week, I went on a fascinating exploratory journey into the world of splines under the guidance of Sofia Wyetzner, and gained an insight into how challenging yet intriguing their construction can be especially under tight constriants such as the requirement for a closed-form expression for their arc-length. This specific construction was the central focus of our inquiry, and the main drive for our exploration.

Splines are important mathematical tools for they are used in various branches of mathematics, including approximation theory, numerical analysis, and statistics, and they find applications in several areas such as Geometry Processing (GP). Loosely speaking, such objects can be understood as mappings that take a set of discrete data points and produce a smooth curve that either interpolates or approximates these points. Thus, it becomes natural immediately to see that they are of great importance in GP since, in the most general sense, GP is a field that is mainly concerned with transforming geometric data1 from one form to another. For example, splines like Bézier and B-spline curves are foundational tools for curve representation in computer graphics and geometric design (Yu, 2024).

Having access to arc lengths for splines in GP is essential for many tasks, including path planning, robot modeling, and animation, where accurate and realistic modeling of curve lengths is critically important. For application-driven purposes, having a closed formula for arc-length computation is highly desirable. However, constructing splines with this arc-length property that can interpolate \( k \) arbitrary data points reasonably well is indeed a challenging task.

Our mentor Sofia, and Shanthika guided us through an exploration of a central question in spline formulation research, as well as several related tangential questions:

Our central question was:

Given \( k \) points, can we construct a spline that interpolates these points and outputs the intermediate arc-lengths of the generated, curve, with some continuity at control points?

And the tangential ones were:

1. Can we achieve \( G^1 \) / \( C^1\) / \( G^2 \) / \(C^2\) continuity at these points with our spline?
2. Given \(k\) points and a total arc-length, can we interpolate these points with the given arc-length in \( \mathbb{R}^2 \)?

In this article, I will share some of the insights I gained from my first week-long research journey, along with potential future directions I would like to pursue.

Understanding Splines and their Arc-length

What are Splines?

A spline is a mapping \( S: [a,b] \subset \mathbb{R} \to \mathbb{R}^n \) defined as:

\( S(t) = \begin{cases} s_1(t) & \text{for } t \in [t_0, t_1] \\ s_2(t) & \text{for } t \in [t_1, t_2] \\ \vdots \\ s_n(t) & \text{for } t \in [t_{n-1}, t_n] \end{cases} \)

where \( a = t_0 < t_1 < \cdots < t_n = b \), and \( s_i: [t_i, t_{i+1}] \to \mathbb{R}^n \) are defined such that, \(S(t)\) ensures \( C^k \) continuity at each internal point \( t_j \), meaning:

\( s_{i-1}^{(m)}(t_j) = s_i^{(m)}(t_j) \) for \( m = 0,1 \dots, k \)

where \( s_{i-1}^{(m)} \) and \( s_i^{(m)} \) are the \( m \)-th derivatives of \( s_{i-1}(t) \) and \( s_i(t) \) respectively.

In other words, a spline is a mathematical construct used to create a smooth curve \( S(t) \) by connecting a sequence of simpler piecewise segments \( s_i(t) \) in a continuous manner. These segments \( s_i(t) \) for \( i = 1, 2, \ldots, n \) are defined on subintervals \( [t_i, t_{i+1}] \) of the parameter domain, and are carefully joined end-to-end. The transitions at the junctions \( \{ t_i \}_{i=1}^{n-1} \) (also known as control points) maintain a desired level of smoothness, typically defined by the continuity of derivatives up to order \( k \) on \( \{s_i(t)\}_{i=1}^n \).

One way to categorize splines is based on the types of functions \(s_i\) they incorporate. For instance, some splines utilize polynomial functions such as Bézier splines and B-splines, while others may employ trigonometric functions such as Fourier splines or hyperbolic functions. However, the most commonly used splines in practice are those based on polynomial functions, which define one or more segments. Polynomial splines are particularly valuable in various applications because of their computational simplicity and flexibility in modeling curves.

Arc Length Calculation

Intuitively, the notion of arc-length of a curve2 can be understood as the numerical measurement of the total distance traveled along the curve from one endpoint to another. This concept is fundamental in both calculus and geometry because it provides a way to quantify the length of some curves, which may not be a straight line. To calculate the arc length of smooth curves, we use integral calculus. Specifically, we apply a definite integral formula – (presented in the following theorem) but, let us first define the concept formally.

Definition. Let \( \gamma : [a, b] \to \mathbb{R}^n \) be a parametrized curve, and \( P = \{ a = t_0, t_1, \dots, t_n = b \} \) a partition of \( [a, b] \). The polygonal approximate3 length of \( \gamma \) from \(P\) is given by the sum of the Euclidean distances between the points \( \gamma(t_i)\) for \( i = 0, 1, \dots, n \):

\(L_P(\gamma) = \sum_{i=0}^{n} |\gamma(t_{i+1}) – \gamma(t_i)|\)

where \( | \cdot | \) denotes the Euclidean norm in \( \mathbb{R}^n \). This polygonal approximation becomes a better estimate of the actual length of the curve as the partition \( P\) becomes finer (i.e., as the maximum distance between successive \( t_i \) tends to zero). The actual length of the curve can be defined as:

\( L(\gamma) = \sup_P L_P(\gamma) \)

If the curve \( \gamma \) is sufficiently smooth, the actual length of the curve can be computed using definite integration as shown in the following theorem.

Arc Length Theorem. Let \( \gamma : [a, b] \to \mathbb{R}^n \) be a \( C^1 \) curve. The arc length \( L(\gamma) \) of \(\gamma\) is given by:

\( L(\gamma) = \int_a^b |\gamma'(t)| \, dt \)

where \(\gamma'(t) \) is the derivative of \(\gamma\) with respect to \(t\).

The challenge

As mentioned earlier, in the context of spline construction for GP tasks, ideally one is interested in constructing splines that have closed-form solutions for their arc length (a formula for computing their arc-length). However, curves with this property for their arc-length are relatively rare because the arc length integral often leads to elliptic integrals4 or other forms that do not have elementary antiderivatives. However, there are some curves for which the arc length can be computed exactly using a closed formula. Here are some examples: Straight lines, circles, and parabolas under certain conditions have closed-form solutions for arc length.

Steps to tackle the central question …

Circular Splines: A Starting Point.

In day one, our mentor Sofia went with us through a paper titled “A Class of \(C^2 \) Interpolating Splines” (Yuksel, 2020). In this work, the author introduces a novel class of non-polynomial parametric splines to interpolate given \( k \) control points. Two components define their class construction: interpolating functions and blending functions (defined later). Each interpolating function \(F_j \in \{F_i \}_{i=1}^n\) defines three consecutive control points, and the blending functions \( \{ B_i \}_{i=1}^m \) combines each two consecutive interpolating functions, forming a smooth curve between two control points. The blending functions are chosen so that \( C^2 \)-continuity everywhere is ensured independent of the choice of the interpolating functions. They use trignometric blending functions.

This type of formulation was constructed to attain some highly desirable properties in the resulting interpolating spline not previously demonstrated by other spline classes, including \( C^2 \) continuity everywhere, local support, and the ability to guarantee self-intersection-free curve segments regardless of the placement of control points and form perfect circular arcs. This paper served as a good starting point in light of the central question under consideration because among the interpolating functions they introduce in their paper are circular curves which have closed formulas for arc-length computation. In addition, it gives insight into the spline formulation practice. However, circular interpolating functions are not without their limitations; their constant curvature makes them difficult to reasonably interpolate arbitrary data, and they look bizarre sometimes.

Interesting note: The earliest documented reference – to the best of my knowledge – discussing the connection of two interpolating curves with a smooth curve dates back to Macqueen’s research in 1936 (MacQueen, 1936) Macqueen’s paper, titled “On the Principal Join of Two Curves on a Surface,” explores the concept of linking two curves on a surface.

Here is a demo constructed by the author to visualize the resulting output spline from their class with different interpolating functions, and below is me playing with the different interpolating functions, and looking at how bizarre the circular interpolating function looks when you throw out data points in an amorphus way.

While playing with the demo, the Gnomus musical track by Modest Mussorgsky was playing in parallel in the back of my mind so I put there for you too. It is a hilarious coincidence that the orchestra goes mad when it is the circular spline’s turn to interpolate the control points, and it does so oddly and bizarrely than the other splines in question. It even goes beyond the boundary of the demo. Did you notice that? 🙂

By the end of the day, I was left with two key inquiries, and a starting point for investigating them:

How do we blend desirable interpolating functions to construct splines with the properties we want? Can we use a combination of circular and elliptical curves to achieve more flexible and accurate interpolation for a wider variety of data points while maintaining a closed form for their arc length? What other combinations could serve us well?
I thought to myself: I should re-visit my functional analysis course as a starting point to think about this in a clear way.

From day two to five, we followed a structured yet not restrictive approach, akin to “we are all going to a definite destination but at the same time, everyone could stop by to explore something that captured their attention along the way if they want to and share what intrigued them with others”. This approach was quite effective and engaging:

  • Implementing a User-Interactive: Our first task was to develop a simple user interface for visualizing existing spline formulations. My SGI project team and friends—Charuka Bandara, Brittney Fahnestock, Sachin Kishan, and I—worked on this in Python (for Catmull-Rom splines) and MATLAB (for Cubic and Quadratic splines). This tool allowed us to visualize how different splines behave and change shape whenever the control points change, also restored my love for coding as I have not coded in a while … you know nothing is more joyful than watching your code executing exactly what you want it to do, right?
    Below is a UI for visualizing a Cubic Spline. Find the UI for the Quadratic Spline, and the Catmull-Rom here.
Interactive Cubic Spline
  • Exploring Blending Functions Method: As a natural progression towards our central inquiry, and a complementary task to reading Yuksel’s paper, we eventually found our way to exploring blending functions—a topic I had been eagerly anticipating.

Here, I decided to pause and explore more about the blending function method in spline formulation.

The blending function method, is a method that provides a way to construct a spline \( S(t) \) as a linear combination of a set of control points \( \{p_i(t)\}_{i=1}^n \) and basis functions (blending functions) \( \{B_i(t)\}_{i=1}^n \) in the following manner:

\( S(t) = \sum_{i=1}^n p_i B_i(t)\) (*)

where:
\( t \): is the parameter defined over the interval of interest \( [a,b] \)

These blending functions \(B_i(t)\) typically exhibit certain properties that govern, for example, the smoothness, continuity, shape, and preservation of key characteristics that we desire in the resulting interpolating splines. Thus, by carefully selecting and designing those blending functions, it is possible to tailor the behaviour of spline interpolation to meet the specific requirements and achieve desired outcomes.

Below are some of the properties of important blending functions:

  1. Partition of Unity: \( \sum_{i=1}^n B_i(t) =1, \forall t \in [a,b] \), also called coordinate system independent. This property is important because it provides a convex combination of the control points in question, and this is something you need to ensure that the curve does not change if the coordinate changes, one way to visualize this is by imagining that the control points in questions are beads sitting in an amorphous manner on a sheet of paper and the interpolating curve as a thread going through them, and you move the sheet of paper around, what we need is that the thread that goes through these beads does not move around as well, and this is what this property means. Notice that if you pick an arbitrary set of blending functions, this property is not immediately actualized, and the curve would change.
  2. Local Support: Each blending function \( B_i(t) \neq 0 \forall t \in I and i=1,2, … , n \) where \(I \subset [a,b] \) is the interval of interest and vanishes everywhere else on the domain. This property is important because it ensures computational efficiency. With this property actualized in one’s blending functions, one does not have to worry about consequences on their interpolating curve if they are to modify one control point .. for it will only affect a local region in the curve, and not the entire curve.
  3. Non-negativity: Blending functions are often non-negative over the domain of definition \( [a,b] \). This property is called convex hull. It is important for maintaining stability and predictability of the interpolating spline. It prevents the curve from oscillating wildly or provides unfaithful representation of the data point in question.
  4. Smoothness: Blending functions dictate the overall smoothness of the resulting spline since the space of \( C^k (\mathcal{K})\) ( \(k\)-times continously differentiable functions defined on a closed and compact set \(\mathcal{K}\) is a vector space over \(\mathbb{R}\) or \(\mathbb{C}\).
  5. Symmetry: Blending functions that are symmetric about the central control point, do not change if the points are ordered in reverse. In this context, symmetry is assured, if and only if, \( \sum_{i=1}^n B_i(t) p_i = \sum_{i=1}^n B_i ((a+b)-t) p_{n-i} \) this holds if \(B_i(t) = B_{n-i}((a+b)-t) \). For instance, Timmer’s parametric cubic, and Ball’s cubic – (a variant of cubic Bézier) – curve obey this property.

In principle, there can be as many properties imposed on the blending functions depending on the desired aspects one wants in their interpolating spline \( S(t) \).

Remark. The spline formulation (*) describes the weighted sum of the given control points \( \{p_i\}_{i=1}^n\). In other words, each control point is influencing the curve by pulling it in its direction, and the associated blending function is what determines the strength of this influence and pull. Sometimes, one does not need to use blending functions in trivial cases.

  • Brain-storming for new spline formulation: Finally, we were prepared for our main task. We brainstormed new spline formulations, in doing so, we first looked at different interpolating curves such as catenaries, parabolas, circles for interpolation and arc-length calculation, explored \(C^1\) and \( C^2 \) continuity at control points, did the math on papers, which something I miss nowadays, for the 3-point, and then laid down the foundation for the \(k\)-point interpolation problem. I worked with the parabolas because I love them.

In parallel, I looked a bit into tangential question two … it is an interesting question:

Given \(k\) points and a total arc-length \(L\), can we interpolate these points with the given arc-length in \(\mathbb{R}^2 \)?

From the polynomial interpolation theorem, we know that for any set of \( k \) distinct points \((x_1, y_1), (x_2, y_2), \ldots, (x_k, y_k) \in \mathbb{R}^2 \), there exists a polynomial \( P(x) \) of degree \( k-1 \) such that: \(P(x_i) = y_i \text{ for } i = 1, 2, \ldots, k.\). Such a polynomial is smooth and differentiable (i.e., it is a \( C^\infty \) function) over \( \mathbb{R} \) thus rectifiable so it possesses a well-defined finite arc-length over any closed interval.

Now, let us parameterize the polynomial \( P(x) \) as a parametric curve \( \mathbf{r}(t) = (t, P(t)) \), where \( t \) ranges over some interval \([a, b] \subset \mathbb{R}\).

Now let us compute its arc-length,

The arc-length \( S \) of the curve \( \mathbf{r}(t) = (t, P(t)) \) from \( t = a \) to \( t = b \) is given by:

\( S = \int_a^b \sqrt{1 + \left(\frac{dP(t)}{dt}\right)^2} \, dt. \)

To achieve the desired total arc-length \( L \), we rescale the parameter \( t \). Define a new parameter \( \tau \) as: \( \tau = \alpha t. \)

Now, the new arc-length ( S’ ) in terms of \( \tau \) is:

\( S’ = \int_{\alpha a}^{\alpha b} \sqrt{1 + \left(\frac{dP(\tau / \alpha)}{d(\tau / \alpha)}\right)^2} \frac{d(\tau / \alpha)}{d\tau} \, d\tau.\)

Since \( \frac{d(\tau / \alpha)}{d\tau} = \frac{1}{\alpha} \), this simplifies to:

\( S’ = \int_{\alpha a}^{\alpha b} \sqrt{1 + \left(\frac{dP(\tau / \alpha)}{d(\tau / \alpha)}\right)^2} \frac{1}{\alpha} \, d\tau.\)

\(S’ = \frac{1}{\alpha} \int_{\alpha a}^{\alpha b} \sqrt{1 + \left(\frac{dP(\tau / \alpha)}{d(\tau / \alpha)}\right)^2} \, d\tau.\)

\(S’ = \frac{S}{\alpha}.\). To ensure \( S’ = L \), choose \( \alpha = \frac{S}{L} \).

Thereby, by appropriately scaling the parameter \( t \), we can adjust the arc-length to match \( L \). Thus, there exists a curve \( C \) that interpolates the \( k \geq 2 \) given points and has the total arc-length \( L \) in \( \mathbb{R}^2 \).

Now, what about implementation? how could we implement an algorithm to execute this task?

It is recommended to visualize your way through an algorithm on a paper first, then formalize to words and symbols, make sure there are no semantic errors in the formalization then code then debug. You know debugging is one of the most intellectually stimulating exercises, and exhausting ones. I am a MATLAB person so here is a MATLAB function you could use to achieve this task ..

function [curveX, curveY] = curveFunction(points, totalArcLength)
    % Input: 
    % points - Nx2 matrix where each row is a point [x, y]
    % totalArcLength - desired total arc length of the curve
    
    % Output:
    % curveX, curveY - vectors of x and y coordinates of the curve
    
    % Number of points
    n = size(points, 1);
    
    % Calculate distances between consecutive points
    distances = sqrt(sum(diff(points).^2, 2));
    
    % Calculate cumulative arc length
    cumulativeArcLength = [0; cumsum(distances)];
    
    % Normalize cumulative arc length to range from 0 to 1
    normalizedArcLength = cumulativeArcLength / cumulativeArcLength(end);
    
    % Desired number of points on the curve
    numCurvePoints = 100; % Change as needed
    
    % Interpolated arc length for the curve
    curveArcLength = linspace(0, 1, numCurvePoints);
    
    % Interpolated x and y coordinates
    curveX = interp1(normalizedArcLength, points(:, 1), curveArcLength, 'spline');
    curveY = interp1(normalizedArcLength, points(:, 2), curveArcLength, 'spline');
    
    % Scale the curve to the desired total arc length
    scale = totalArcLength / cumulativeArcLength(end);
    curveX = curveX*scale;
    curveY = curveY*scale;

    %plot(curveX, curveY);

 % Plot the curve
    figure;
    plot(curveX, curveY);
    hold on;
    title('Curve Interpolation using Arc-Length and Points');
    xlabel('X');
    ylabel('Y');
    grid on;
    hold off;
end

Key Takeaways, and Possible Research Directions:

Key Takeaway:

  • Splines are important!
  • Constructing them is nontrivial especially under multiple conflicting constraints, as it significantly narrows the feasible search space of potential representative functions.
  • Progress in abstract mathematics makes the lives of computational engineers and professionals in applied numerical fields easier, as it provides them with greater spaces for creativity and discoveries of new computational tools.

Possible Future Research Directions:

“I will approach this question as one approaches a

hippopotamus: stealthily and from the side.”

– R. Mahony

I borrowed this quote from Prof. Justin Solomon’s Ph.D. thesis. I read parts of it this morning and found that the quote perfectly captures my perspective on tackling the main question of this project. In this context, the “side” approach would involve exploring the question through the lens of functional analysis. 🙂

Acknowledgments. I would like to express my sincere gratitude to our project mentor Sofia Di Toro Wyetzner and Teaching Assistant Shanthika Naik for their continuous support, guidance, and insights to me and my project fellows during this interesting research journey, which prepared me well for my second project on “Differentiable Representations for 2D Curve Networks”. Moreover, I would like to thank my team fellows Charuka Bandara, Brittney Fahnestock, and Sachin Kishan for sharing interesting papers which I am planning to read after SGI!

Bibliography

  1. Yu, Y., Li, X., & Ji, Y. (2024). On intersections of b-spline curves. Mathematics, 12(9), 1344. https://doi.org/10.3390/math12091344
  2. Silva, L. and Gay Neto, A. (2023). Geometry reconstruction based on arc splines with application to wheel-rail contact simulation. Engineering Computations, 40(7/8), 1889-1920. https://doi.org/10.1108/ec-11-2022-0666
  3. Yuksel, C. (2020). A class of c 2 interpolating splines. ACM Transactions on Graphics, 39(5), 1-14. https://doi.org/10.1145/3400301
  4. Zorich, V. A. (2015). Mathematical Analysis I. Springer. https://doi.org/10.1007/978-3-662-48792-1
  5. Shilov, G. E. (1996). Elementary Functional Analysis (2nd edition). Dover.
  6. MacQueen, M. (1936). On the principal join of two curves on a surface. American Journal of Mathematics, 58(3), 620. https://doi.org/10.2307/2370980
  7. Sederberg, T. (2012). Computer Aided Geometric Design. Faculty Publications. https://scholarsarchive.byu.edu/facpub/1

Project github repo: (link)


  1. Geometric data refers to information that describes the shape, position, and properties of objects in space. It includes the following key components: curves, surfaces, meshes, volumes ..etc ↩︎
  2. In this article, when we say curves, we usually refer to parametric curves. However, parametrized curves are not the same as curves in general ↩︎
  3. The term “polygonal approximation” should not be taken too
    literally; The term suggests that the Euclidean distance between two points \(p\) and \(q\) should be the “straight-line” distance between them. ↩︎