Category: Research

What it Takes to Get a SLAM Dunk, Part II

Post author By nicolaspigadas14
Post date August 10, 2024

This is the 2nd part of the two-part post series, prepared by Krishna and I. Krishna presented an overview of SLAM systems in a very intuitive and engaging way. In this part, I explore the future of SLAM systems in endoscopy and how our team plans to shape it.

Collaborators: Krishna Chebolu

Introduction

What about the future of SLAM endoscopy systems? To get an insight on where research is heading, we must first discuss the challenges posed by the task to localise an agent and map such a difficult environment, as well as the weaknesses of current systems.

On one hand, the environment of the inside of the human body, coupled with data/device heterogeneity and image quality issues, significantly hinder the performance of endoscopy SLAM systems [1], [2] due to:

1) Texture scarceness, scale ambiguity

2) Illumination variation

3) Bodies (foreign or not), fluids and their movement (e.g., mucus, mucosal movement)

4) Deformable tissues and occlusions

5) Scope-related issues (e.g., imagery quality variability)

6) Underlining scene dynamics (e.g., imminent corruption of frames with severe artefacts, large organ motion and surface drifts)

7) Data heterogeneity (e.g., population diversity, rare or inconspicuous disease cases, variability in disease appearances from one organ to the other, endoscope variability)

8) Difference in device manufacturers

9) Input of experts being required for their reliable development

10) The organ preparation process

11) Additional imaging quality issues (e.g. free/non-uniform hand motions and organ movements, different image modalities)

12) Real time performance (speed and accuracy trade-off)

Current research of endoscopic SLAM systems mainly focuses on the first 3 of the aforementioned challenges; the state-of-the-art pipelines focus on understanding depth despite the lack of texture, as well as handling lighting changes and foreign bodies like mucus that can be reflective or move and, thus, skew the mapping reconstruction.

Images 1, 2, 3: The images above showcase the three main problems that skew the tissue structure understanding and hinder the performance of mapping of SLAM systems in endoscopy: (1) foreign bodies that are reflective (2) lighting variations and (3) lack of texture. Image credits: [3], [3], [4].

On the other hand, we must pinpoint where the weaknesses of such systems lie. The three main modules of AI endoscopy systems, that operate on image data, are Simultaneous Localization and Mapping (SLAM), Depth Estimation (DE) and Visual Odometry (VO); with the last two being submodules of the broader SLAM systems. SLAM is a computational method that enables a device to map its environment while simultaneously determining its own position within that map, which is often achieved via VO; a technique that estimates the camera’s position and trajectory by examining changes across a series of images. Depth estimation is the process of determining the distance between a camera and the objects in its view by analyzing visual information from one or more images, which is crucial for SLAM to accurately map the environment in three dimensions and understand its surroundings more effectively. Attempting to use general purpose SLAM systems on endoscopy data clearly shows that DE and map reconstruction are underperforming, while localisation/VO is sufficiently captured. This conclusion was reached based on initial experiments; however, further investigations are warranted.

Though the challenges and system weaknesses that current research aims to address are critical aspects of the models’ usability and performance, there is still a wide gap between the curated settings under which these models perform and real-world clinical settings. Clinical applications are still uncommon, due to the lack of holistic and representative datasets, in conjuction with limited participation of clinical experts. This leads to models that lack generalisability; widely used supervised techniques are data voracious and require many human annotations, which, apart from scarce, are often imperfect or overfitted to predominant samples in cohorts. Novel deep learning methods should be steered towards training on diverse endoscopic datasets, the introduction of explainability of results and the interpretability of models, which are required to accelerate this field. Finally, suitable evaluation metrics (i.e. generalisability assessments and robustness tests) should be defined to determine the strength of developed methods in regards to clinical translation.

For a future of advanced and applicable AI endoscopy systems, the directions are clear, as discussed in [1]:

1) Endoscopy-specific solutions must be developed, rather than just applying pipelines from the computer vision field

2) Robustness and generalisation evaluation metrics of the developed solutions must be defined to set the standard to assess and compare model performance

3) Practicability, compactness and real time effectiveness should also be quantified

4) More challenging problems should be explored (subtle lesions instead of apparent lesions)

5) The developed models should be able to adapt to datasets produced in different clinics, using different endoscopes, in the context of varying manifestations of diseases

6) Multi-modal and multi-scale integration of data should be incorporated in these systems

7) Clinical validation is necessary to steadily integrate these systems in the clinical process

Method

But how do we envision the future of SLAM endoscopy systems?

Our team aims to address directly the issues of texture scarceness, illumination variation and handling of foreign bodies, while indirectly combating some of the rest of the challenges. Building upon state-of-the-art SLAM systems, which already handle localisation/VO sufficiently, we aim to further enhance their mapping process, by integrating a state-of-the-art endoscopy monocular depth estimation pipeline [3] and by developing a module to understand lighting variations in the context of endoscopic image analysis. The aforementioned module will have a corrective nature, automatically adjusting the lighting in the captured images to ensure that the visuals are clear and consistent. Potentially, it could also enhance the image quality by adjusting brightness, contrast, and other image parameters in real-time, standardizing the images of different frames of the endoscopy video. As the module’s task is to improve the visibility and consistency of the image features, it would consequentially also support the depth estimation process, by providing clearer cues and contrast for accurate depth calculations by the endoscopy monocular depth estimation pipeline. Thus, the module would ensure a more consistent and refined input to the SLAM model, rather than raw endoscopy data, which suffer from inconsistencies and heterogeneities, never seen before by the model. With the aforementioned integrations we aim to develop a specialised SLAM endoscopy system and test it in the context of clinical colonoscopy [5]. Ideally, the plan is to first train and test our pipeline on a curated dataset to test its performance under controlled settings and then it would be of great interest to adjust each part of the pipeline to make it perform on real-world clinical data or across multiple datasets. This will provide us with the opportunity to see where a state-of-the-art SLAM endoscopy system stands in the context of real-world applicability and help quantify and address the issues explored in the previous section.

Image 4: State-of-the-art clinical mesh reconstruction using the endoscopy monocular depth estimation pipeline [5].

Image 5: The endoscopy monocular depth estimation pipeline also extracts state-of-the-art depth estimation in endoscopy videos.

Colonoscopy Data

The type of endoscopy procedure we choose to develop our pipeline for is colonoscopy; a medical procedure that uses a flexible fibre-optic instrument equipped with a camera and light (a colonoscope) to examine the interior of the colon and rectum. More specifically, we select to work with the Colonoscopy 3D Video Dataset (C3VD) [5]. The significance of this dataset study is the fact that it provides high quality ground truth data, obtained using a high-definition clinical colonoscope and high-fidelity colon models, creating a benchmark for computer vision pipelines. The study introduced a novel multimodal 2D-3D registration technique to register optical video sequences with ground truth rendered views of a known 3D model.

Video 1: C3VD dataset: Data from the colonoscopy camera (left) and depth estimation (right) extracted by a Generative Adversarial Network (GAN). Video credits: [5]

Conclusion

SLAM systems are the state-of-the-art for localisation and mapping and endoscopy is the gold standard procedure for many hollow organs. Combining the two, we get a powerful medical tool that can not only improve patient care, but also be life-defining in some cases. Its use cases can be prognostic, diagnostic, monitoring and even therapeutic, ranging from, but not limited to: disease surveillance, inflammation monitoring, early cancer detection, tumour characterisation, resection procedures, minimally invasive treatment interventions and therapeutic response monitoring. With the development of SLAM endoscopy systems, the endoscopy surgeon has acquired a visual overview of various environments inside the human body, that would otherwise be impossible. Endoscopy being highly operator-dependent with grim clinical outcomes in some disease cases, makes reliable and accurate automated system guidance imperative. Thus, in recent years, there has been a significant increase in the publication of endoscopic imaging-based methods within the fields of computer-aided detection (CADe), computer-aided diagnosis (CADx) and computer-assisted surgery (CAS). In the future, most designed methods must be more generalisable to unseen noisy data, patient population variability and variable disease appearances, giving an answer to the multi-faceted challenges that the latest models fail to address, under actual clinical settings.

This post concludes part 11 of What it Takes to Get a SLAM Dunk.

Image 6: Michael Jordan (considered by me and many as the G.O.A.T.) performing his most famous dunk. Image credits: ScienceABC

References

[1] Ali, S. Where do we stand in AI for endoscopic image analysis? Deciphering gaps and future directions. npj Digit. Med. 5, 184 (2022). https://doi.org/10.1038/s41746-022-00733-3

[2] Ali, S., Zhou, F., Braden, B. et al. An objective comparison of detection and segmentation algorithms for artefacts in clinical endoscopy. Sci Rep 10, 2748 (2020). https://doi.org/10.1038/s41598-020-59413-5

[3] Paruchuri, A., Ehrenstein, S., Wang, S., Fried, I., Pizer, S. M., Niethammer, M., & Sengupta, R. Leveraging near-field lighting for monocular depth estimation from endoscopy videos. In Proceedings of the European Conference on Computer Vision (ECCV), (2024). https://doi.org/10.48550/arXiv.2403.17915

[4] https://commons.wikimedia.org/wiki/File:Stomach_endoscopy_1.jpg

[5] Bobrow, T. L., Golhar, M., Vijayan, R., Akshintala, V. S., Garcia, J. R., & Durr, N. J. Colonoscopy 3D video dataset with paired depth from 2D-3D registration. Medical Image Analysis, 102956 (2023). https://doi.org/10.48550/arXiv.2206.08903

Post author By nicolaspigadas14
Post date August 10, 2024
Tags Endoscopy, Endoscopy Image Standardisation, Future, Monocular Depth Estimation, SLAM

Research

What it Takes to Get a SLAM Dunk, Part I

Post author By kris.chebo
Post date August 10, 2024

In this two-part post series, Nicolas and I dive deeper into SLAM systems– our project’s focus for the past two weeks. In this part, I introduce and cover the evolution of SLAM systems. In the next part, Nicolas harnesses our interest by discussing the future. By the end of both parts, we should be able to give you an overview of What it Takes to Get a SLAM Dunk.

Collaborators: Nicolas Pigadas

Introduction

Simultaneous Localization and Mapping (SLAM) systems have become a standard in various technological fields, from autonomous robotics to augmented reality. However, in recent years, this technology has found a particularly unique application in medical imaging– in endoscopic videos. But what is SLAM?

Figure 1: A sample image using SLAM reconstruction from SG News Desk.

SLAM systems were conceptualized in robotics and computer vision for navigation purposes. Before SLAM, the fields employed more elementary methods,

Localization: the ability of a system (a robot, say) to determine its position within a known environment. This can be done using odometry, landmark-based localization, and dead reckoning.
Mapping: the process of creating a representation of an environment, typically in the form of a 2D or 3D map. This was done using grid-based and feature-based methods.

Figure 2: Example of large-scale 3D semantic mapping by a vehicle.

You may be thinking, Krishna, you just described SLAM systems, it sounds like. You are right, but the localizing and mapping were separate processes. So a robot would go through the pains of the Heisenberg principle, i.e., the robot would either localize or map– the or is exclusionary.

It was fairly obvious, but still daunting what the next step in research would be. Before we SLAM dunk our basketball, we must do a few lay-ups and free-throw shoots first.

Precursors to SLAM

Here are some inspirations that contributed to the development of SLAM

Probabilistic robotics: The introduction of probabilistic approaches, such as Bayesian filtering, allowed robots to estimate their position and map the environment with a degree of uncertainty, paving the way for more integrated systems.
Kalman filtering: a mathematical technique for estimating the state of a dynamic system. It allowed for continuous estimation of a robot’s position and could be invariant to noisy sensor data.
Cognitive Mapping in Animals: Research in cognitive science and animal navigation provided theoretical inspiration, particularly the idea that animals build mental maps of their environment while simultaneously keeping track of their location.

Figure 3: Spatial behavior and cognitive mapping of mice with aging. Image from Nature.

SLAM Dunk – A Culmination (some real Vince Carter stuff)

Finally, many researchers agreed that the separation of localizing and mapping was ineffective, and great efforts went into their integration. SLAM was developed. The goal was to enable systems to explore and understand an unknown environment autonomously, they needed to localize and map the environment simultaneously, with each task informing and improving the other.

With its unique ability to localize and map, researchers found SLAM’s use in any sensory device. Some of SLAM’s earlier use were sensor-based; so data would be inputted from range finders, sonar, and LIDAR; in the late 80s and early 90s. It is good to note that the algorithms were computationally intensive– and still are.

As technology evolved, a vision-based SLAM emerged. This shift was inspired by the human visual system, which navigates the world primarily through sight, enabling more natural and flexible mapping techniques.

Key Milestones

With the latest iterations of SLAM being exponentially better than the origin, it is important to recognize the journey. Here are notable SLAM systems:

EKF-SLAM (Extended Kalman Filter SLAM): One of the earliest and most influential SLAM algorithms, EKF-SLAM, laid the foundation for probabilistic approaches to SLAM, allowing for more accurate mapping and localization.
FastSLAM: Introduced in the early 2000s, FastSLAM utilized particle filters, making it more efficient and scalable. This development was crucial in enabling real-time SLAM applications.
Visual SLAM: The transition to vision-based SLAM in the mid-2000s opened new possibilities for the technology. Visual SLAM systems, such as PTAM (Parallel Tracking and Mapping), enabled more detailed and accurate mapping using standard cameras, a significant step toward broader applications.

Figure 4: Left LSD-SLAM, right ORB-SLAM. Image found in fzheng.me

From Robotics to Endoscopy (Medical Vision)

As SLAM technology matured, researchers explored its potential beyond traditional robotics. Medical imaging, particularly endoscopy, presented a fantastic opportunity for SLAM. Endoscopy is a medical procedure involving a flexible tube with a camera to visualize the body’s interior, often within complex and dynamic environments like the gastrointestinal tract.

Figure 5: Endoscopy procedure overview. Image from John Hopkins Medicine.

It is fairly trivial why SLAM could be applied to endoscopic and endoscopy-like procedures to gain insights and make more medically informed decisions. Early work focused on using visual SLAM to navigate the gastrointestinal tract, where the narrow and deformable environment presented significant challenges.

One of the first successful implementations involved using SLAM to reconstruct 3D maps of the colon during colonoscopy procedures. This approach improved navigation accuracy and provided valuable information for diagnosing conditions like polyps or tumors.

Researchers also explored the integration of SLAM with other technologies, such as optical coherence tomography (OCT) and ultrasound, to enhance the quality of the maps and provide additional layers of information. These efforts laid the groundwork for more advanced SLAM systems capable of handling the complexities of real-time endoscopic navigation.

Figure 6: Visual of Optical Coherence Tomography from News-Medical.

Endoscopy SLAMs – What Our Group Looked At

As a part of our study, we looked at some presently used and state-of-the-art SLAM systems. Below are the three that various members of our team attempted:

NICER-SLAM (RGB): a dense RGB SLAM system that simultaneously optimizes for camera poses and a hierarchical neural implicit map representation, which also allows for high-quality novel view synthesis.
ORB3-SLAM (RBG): (there is also ORB1 and ORB2) ORB-SLAM3 is the first real-time SLAM library able to perform Visual, Visual-Inertial, and Multi-Map SLAM with monocular, stereo, and RGB-D cameras, using pin-hole and fisheye lens models. In all sensor configurations, ORB-SLAM3 is as robust as the best systems available in the literature and significantly more accurate.
DROID-SLAM (RBG): a new deep learning-based SLAM system. DROID-SLAM consists of recurrent iterative updates of camera pose and pixel-wise depth through a Dense Bundle Adjustment layer.

Figure 7: Demo pictures from Gaussian Splatting SLAM.

Some other SLAM systems that our team would have loved to try our hand at are:

Gaussian Splatting SLAM: first application of 3D Gaussian Splatting in monocular SLAM, the most fundamental but the hardest setup for Visual SLAM.
GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM. This system uses a deformable point cloud as the scene representation and achieves lower trajectory error and higher rendering accuracy compared to competitive approaches.

More SLAM methods can be found in this survey.

Conclusion

This concludes part 1 of What it Takes to Get a SLAM Dunk. This post should have given you a gentle, but robust-enough introduction to SLAM systems. Vince Carter might even approve.

Figure 9: An homage to Vince Carter, arguably the greatest dunk-er ever. Image from Bleacher Report.

Post author By kris.chebo
Post date August 10, 2024
Tags localization, mapping, reconstruction, sgi2024, SLAM

Research

A Study on Surface Reconstruction from Signed Distance Data Part 2: Error Methods

Post author By hong.x
Post date August 9, 2024

Primary mentors: Silvia Sellán, University of Toronto and Oded Stein, University of Southern California

Volunteer assistant: Andrew Rodriguez

Fellows: Johan Azambou, Megan Grosse, Eleanor Wiesler, Anthony Hong, Artur RB Boyago, Sara Samy

After talking about what are some of the ways to reconstruct sdfs (see part one), we now study the question of how to evaluate their performance. Two classes of error function are considered, the first more inherent to our sdf context, and the second more general in all situations. We will first show results by our old friends, Chamfer distance, Hausdorff distance, and area method (see here for example), and then introduce the inherent one.

Hausdorff and Chamfer Distance

The Chamfer distance provides an average-case measure of the dissimilarity between two shapes. It calculates the sum of the distances from each point in one set to the nearest point in the other set. This measure is particularly useful when the goal is to capture the overall discrepancy between two shapes while being less sensitive to outliers. The Chamfer distance is defined as:
$$
d_{\mathrm{Chamfer }}(X, Y):=\sum_{x \in X} d(x, Y)=\sum_{x \in X} \inf _{y \in Y} d(x, y)
$$

The Bi-Chamfer distance is an extension that considers the average of the Chamfer distance computed in both directions (from $X$ to $Y$ and from $Y$ to $X$ ). This bidirectional measure provides a more balanced assessment of the dissimilarity between the shapes:
$$
d_{\mathrm{B} \mathrm{-Chamfer }}(X, Y):=\frac{1}{2}\left(\sum_{x \in X} \inf {y \in Y} d(x, y)+\sum{y \in Y} \inf _{x \in X} d(x, y)\right)
$$

The Hausdorff distance, on the other hand, measures the worst-case scenario between two shapes. It is defined as the maximum distance from a point in one set to the closest point in the other set. This distance is particularly stringent because it reflects the largest deviation between the shapes, making it highly sensitive to outliers.

The formula for Hausdorff distance is:
$$
d_{\mathrm{H}}^Z(X, Y):=\max \left(\sup {x \in X} d_Z(x, Y), \sup {y \in Y} d_Z(y, X)\right)
$$

We analyze the Hausdorff distance for the uneven capsule (the shape on the right; exact sdf taken from there) in particular.

This plot visually compares the original zero level set (ground truth) with the reconstructed polyline generated by the marching squares algorithm at a specific resolution. The black dots represent the vertices of the polyline, showing how closely the reconstruction matches the ground truth, demonstrating the efficacy of the algorithm at capturing the shape’s essential features.

This plot shows the relationship between the Hausdorff distance and the resolution of the reconstruction. As resolution increases, the Hausdorff distance decreases, illustrating that higher resolutions produce more accurate reconstructions. The log-log plot with a linear fit suggests a strong inverse relationship, with the slope indicating a nearly quadratic decrease in Hausdorff distance as resolution improves.

Area Method for Error

Another error method we explored in this project is an area-based method. To start this process, we can overlay the original polyline and the generated polyline. Then, we can determine the area between the two, counting all regions as positive area. Essentially, this means that if we take values inside a polyline to be negative and outside a polyline to be positive, the area counted as error consists of the set of all regions where the sign of the original polyline’s SDF is not equal to the sign of the generated polyline’s SDF. The resultant area corresponds to the error of the reconstruction.

Here is a simple example of this area method using two quadrilaterals. The area between them is represented by their union (all blue area) with their intersection (the darker blue triangle) removed:

Here is an example of the area method applied to a star, with the area quantity corresponding to error printed at the top:

Inherent Error Function

We define the error function between the reconstructed polyline and the zero level set (ZLS) of a given signed distance function (SDF) as:
$$
\mathrm{Error}=\frac{1}{|S|} \sum_{s \in S} \mathrm{sdf}(s)
$$

Here, $S$ represents a set of sample points from the reconstructed polyline generated by the marching squares algorithm, and $|\mathrm{S}|$ denotes the cardinality of this set. The sample points include not only the vertices but also the edges of the reconstructed polyline.

The key advantage of this error function lies in its simplicity and directness. Since the SDF inherently encodes the distance between any query point and the original polyline, there’s no need to compute distances separately as required in Hausdorff or Chamfer distance calculations. This makes the error function both efficient and well-suited to our specific context, providing a more streamlined and integrated approach to measuring the accuracy of the reconstruction.

Note that one variant one can consider is to square the $\mathrm{sdf}(s)$ part to get rid of the signed nature, because the signs can cancel with each other for the reconstructed polyline is not always entirely inside the ground truth or entirely cotains the ground truth.

Below are one experiment we did for computing using this inherent error function, the unsquared one. We let the circle sdf be fixed and let the square sdf “escape” from the circle in a constant speed of $0.2$. That is, we used the union of a fixed circle and a square with changing position as a series of shapes to do analysis. We then draw with color map used in this repo for the ground truths and the marching square results using the common color map.

The inherent error of each union After several trials, one simple observation drawn from this escapaing motion is that when two shapes are more confounding with each other, as in the initial stage, the error is higher; and when the image starts to have two separated connected components, the error rises again. The first one can perhaps be explained by what the paper pointed out as pseudo-sdfs, and the second one is perhaps due to occurence of multipal components.

Here is another experiment. We want to know how well this method can simulate the physical characteristics of an object’s rugged level. We used a series of regular polygons. Although the result is theoretically unintheresting as the errors are all nearly zero, this is visually conformting.

Post author By hong.x
Post date August 9, 2024

Math Research

Spline Construction with Closed-form Arc-Length Solution: A Guided Tour

Post author By ehsan.ali
Post date August 8, 2024

^{Author: Ehsan Shams (Alexandria Univerity, Egypt)
Project Mentor: Sofia Di Toro Wyetzner (Stanford University, USA)
Volunteer Teaching Assistant: Shanthika Naik (CVIT, IIIT Hyderabad, India)}

In my first SGI research week, I went on a fascinating exploratory journey into the world of splines under the guidance of Sofia Wyetzner, and gained an insight into how challenging yet intriguing their construction can be especially under tight constriants such as the requirement for a closed-form expression for their arc-length. This specific construction was the central focus of our inquiry, and the main drive for our exploration.

Splines are important mathematical tools for they are used in various branches of mathematics, including approximation theory, numerical analysis, and statistics, and they find applications in several areas such as Geometry Processing (GP). Loosely speaking, such objects can be understood as mappings that take a set of discrete data points and produce a smooth curve that either interpolates or approximates these points. Thus, it becomes natural immediately to see that they are of great importance in GP since, in the most general sense, GP is a field that is mainly concerned with transforming geometric data¹ from one form to another. For example, splines like Bézier and B-spline curves are foundational tools for curve representation in computer graphics and geometric design (Yu, 2024).

Having access to arc lengths for splines in GP is essential for many tasks, including path planning, robot modeling, and animation, where accurate and realistic modeling of curve lengths is critically important. For application-driven purposes, having a closed formula for arc-length computation is highly desirable. However, constructing splines with this arc-length property that can interpolate $ k $ arbitrary data points reasonably well is indeed a challenging task.

Our mentor Sofia, and Shanthika guided us through an exploration of a central question in spline formulation research, as well as several related tangential questions:

Our central question was:

Given $ k $ points, can we construct a spline that interpolates these points and outputs the intermediate arc-lengths of the generated, curve, with some continuity at control points?

And the tangential ones were:

1. Can we achieve $ G^1 $ / $ C^1$ / $ G^2 $ / $C^2$ continuity at these points with our spline?
2. Given $k$ points and a total arc-length, can we interpolate these points with the given arc-length in $ \mathbb{R}^2 $?

In this article, I will share some of the insights I gained from my first week-long research journey, along with potential future directions I would like to pursue.

Understanding Splines and their Arc-length

What are Splines?

A spline is a mapping $ S: [a,b] \subset \mathbb{R} \to \mathbb{R}^n $ defined as:

$ S(t) = \begin{cases} s_1(t) & \text{for } t \in [t_0, t_1] \\ s_2(t) & \text{for } t \in [t_1, t_2] \\ \vdots \\ s_n(t) & \text{for } t \in [t_{n-1}, t_n] \end{cases} $

where $ a = t_0 < t_1 < \cdots < t_n = b $, and $ s_i: [t_i, t_{i+1}] \to \mathbb{R}^n $ are defined such that, $S(t)$ ensures $ C^k $ continuity at each internal point $ t_j $, meaning:

$ s_{i-1}^{(m)}(t_j) = s_i^{(m)}(t_j) $ for $ m = 0,1 \dots, k $

where $ s_{i-1}^{(m)} $ and $ s_i^{(m)} $ are the $ m $-th derivatives of $ s_{i-1}(t) $ and $ s_i(t) $ respectively.

In other words, a spline is a mathematical construct used to create a smooth curve $ S(t) $ by connecting a sequence of simpler piecewise segments $ s_i(t) $ in a continuous manner. These segments $ s_i(t) $ for $ i = 1, 2, \ldots, n $ are defined on subintervals $ [t_i, t_{i+1}] $ of the parameter domain, and are carefully joined end-to-end. The transitions at the junctions $ \{ t_i \}_{i=1}^{n-1} $ (also known as control points) maintain a desired level of smoothness, typically defined by the continuity of derivatives up to order $ k $ on $ \{s_i(t)\}_{i=1}^n $.

One way to categorize splines is based on the types of functions $s_i$ they incorporate. For instance, some splines utilize polynomial functions such as Bézier splines and B-splines, while others may employ trigonometric functions such as Fourier splines or hyperbolic functions. However, the most commonly used splines in practice are those based on polynomial functions, which define one or more segments. Polynomial splines are particularly valuable in various applications because of their computational simplicity and flexibility in modeling curves.

Arc Length Calculation

Intuitively, the notion of arc-length of a curve² can be understood as the numerical measurement of the total distance traveled along the curve from one endpoint to another. This concept is fundamental in both calculus and geometry because it provides a way to quantify the length of some curves, which may not be a straight line. To calculate the arc length of smooth curves, we use integral calculus. Specifically, we apply a definite integral formula – (presented in the following theorem) but, let us first define the concept formally.

Definition. Let $ \gamma : [a, b] \to \mathbb{R}^n $ be a parametrized curve, and $ P = \{ a = t_0, t_1, \dots, t_n = b \} $ a partition of $ [a, b] $. The polygonal approximate³ length of $ \gamma $ from $P$ is given by the sum of the Euclidean distances between the points $ \gamma(t_i)$ for $ i = 0, 1, \dots, n $:

$L_P(\gamma) = \sum_{i=0}^{n} |\gamma(t_{i+1}) – \gamma(t_i)|$

where $ | \cdot | $ denotes the Euclidean norm in $ \mathbb{R}^n $. This polygonal approximation becomes a better estimate of the actual length of the curve as the partition $ P$ becomes finer (i.e., as the maximum distance between successive $ t_i $ tends to zero). The actual length of the curve can be defined as:

$ L(\gamma) = \sup_P L_P(\gamma) $

If the curve $ \gamma $ is sufficiently smooth, the actual length of the curve can be computed using definite integration as shown in the following theorem.

Arc Length Theorem. Let $ \gamma : [a, b] \to \mathbb{R}^n $ be a $ C^1 $ curve. The arc length $ L(\gamma) $ of $\gamma$ is given by:

$ L(\gamma) = \int_a^b |\gamma'(t)| \, dt $

where $\gamma'(t) $ is the derivative of $\gamma$ with respect to $t$.

The challenge

As mentioned earlier, in the context of spline construction for GP tasks, ideally one is interested in constructing splines that have closed-form solutions for their arc length (a formula for computing their arc-length). However, curves with this property for their arc-length are relatively rare because the arc length integral often leads to elliptic integrals⁴ or other forms that do not have elementary antiderivatives. However, there are some curves for which the arc length can be computed exactly using a closed formula. Here are some examples: Straight lines, circles, and parabolas under certain conditions have closed-form solutions for arc length.

Steps to tackle the central question …

Circular Splines: A Starting Point.

In day one, our mentor Sofia went with us through a paper titled “A Class of $C^2 $ Interpolating Splines” (Yuksel, 2020). In this work, the author introduces a novel class of non-polynomial parametric splines to interpolate given $ k $ control points. Two components define their class construction: interpolating functions and blending functions (defined later). Each interpolating function $F_j \in \{F_i \}_{i=1}^n$ defines three consecutive control points, and the blending functions $ \{ B_i \}_{i=1}^m $ combines each two consecutive interpolating functions, forming a smooth curve between two control points. The blending functions are chosen so that $ C^2 $-continuity everywhere is ensured independent of the choice of the interpolating functions. They use trignometric blending functions.

This type of formulation was constructed to attain some highly desirable properties in the resulting interpolating spline not previously demonstrated by other spline classes, including $ C^2 $ continuity everywhere, local support, and the ability to guarantee self-intersection-free curve segments regardless of the placement of control points and form perfect circular arcs. This paper served as a good starting point in light of the central question under consideration because among the interpolating functions they introduce in their paper are circular curves which have closed formulas for arc-length computation. In addition, it gives insight into the spline formulation practice. However, circular interpolating functions are not without their limitations; their constant curvature makes them difficult to reasonably interpolate arbitrary data, and they look bizarre sometimes.

Interesting note: The earliest documented reference – to the best of my knowledge – discussing the connection of two interpolating curves with a smooth curve dates back to Macqueen’s research in 1936 (MacQueen, 1936) Macqueen’s paper, titled “On the Principal Join of Two Curves on a Surface,” explores the concept of linking two curves on a surface.

Here is a demo constructed by the author to visualize the resulting output spline from their class with different interpolating functions, and below is me playing with the different interpolating functions, and looking at how bizarre the circular interpolating function looks when you throw out data points in an amorphus way.

While playing with the demo, the Gnomus musical track by Modest Mussorgsky was playing in parallel in the back of my mind so I put there for you too. It is a hilarious coincidence that the orchestra goes mad when it is the circular spline’s turn to interpolate the control points, and it does so oddly and bizarrely than the other splines in question. It even goes beyond the boundary of the demo. Did you notice that? 🙂

By the end of the day, I was left with two key inquiries, and a starting point for investigating them:

How do we blend desirable interpolating functions to construct splines with the properties we want? Can we use a combination of circular and elliptical curves to achieve more flexible and accurate interpolation for a wider variety of data points while maintaining a closed form for their arc length? What other combinations could serve us well?
I thought to myself: I should re-visit my functional analysis course as a starting point to think about this in a clear way.

From day two to five, we followed a structured yet not restrictive approach, akin to “we are all going to a definite destination but at the same time, everyone could stop by to explore something that captured their attention along the way if they want to and share what intrigued them with others”. This approach was quite effective and engaging:

Implementing a User-Interactive: Our first task was to develop a simple user interface for visualizing existing spline formulations. My SGI project team and friends—Charuka Bandara, Brittney Fahnestock, Sachin Kishan, and I—worked on this in Python (for Catmull-Rom splines) and MATLAB (for Cubic and Quadratic splines). This tool allowed us to visualize how different splines behave and change shape whenever the control points change, also restored my love for coding as I have not coded in a while … you know nothing is more joyful than watching your code executing exactly what you want it to do, right?
Below is a UI for visualizing a Cubic Spline. Find the UI for the Quadratic Spline, and the Catmull-Rom here.

Interactive Cubic Spline

Exploring Blending Functions Method: As a natural progression towards our central inquiry, and a complementary task to reading Yuksel’s paper, we eventually found our way to exploring blending functions—a topic I had been eagerly anticipating.

Here, I decided to pause and explore more about the blending function method in spline formulation.

The blending function method, is a method that provides a way to construct a spline $ S(t) $ as a linear combination of a set of control points $ \{p_i(t)\}_{i=1}^n $ and basis functions (blending functions) $ \{B_i(t)\}_{i=1}^n $ in the following manner:

$ S(t) = \sum_{i=1}^n p_i B_i(t)$ (*)

where:
$ t $: is the parameter defined over the interval of interest $ [a,b] $

These blending functions $B_i(t)$ typically exhibit certain properties that govern, for example, the smoothness, continuity, shape, and preservation of key characteristics that we desire in the resulting interpolating splines. Thus, by carefully selecting and designing those blending functions, it is possible to tailor the behaviour of spline interpolation to meet the specific requirements and achieve desired outcomes.

Below are some of the properties of important blending functions:

Partition of Unity: $ \sum_{i=1}^n B_i(t) =1, \forall t \in [a,b] $, also called coordinate system independent. This property is important because it provides a convex combination of the control points in question, and this is something you need to ensure that the curve does not change if the coordinate changes, one way to visualize this is by imagining that the control points in questions are beads sitting in an amorphous manner on a sheet of paper and the interpolating curve as a thread going through them, and you move the sheet of paper around, what we need is that the thread that goes through these beads does not move around as well, and this is what this property means. Notice that if you pick an arbitrary set of blending functions, this property is not immediately actualized, and the curve would change.
Local Support: Each blending function $ B_i(t) \neq 0 \forall t \in I and i=1,2, … , n $ where $I \subset [a,b] $ is the interval of interest and vanishes everywhere else on the domain. This property is important because it ensures computational efficiency. With this property actualized in one’s blending functions, one does not have to worry about consequences on their interpolating curve if they are to modify one control point .. for it will only affect a local region in the curve, and not the entire curve.
Non-negativity: Blending functions are often non-negative over the domain of definition $ [a,b] $. This property is called convex hull. It is important for maintaining stability and predictability of the interpolating spline. It prevents the curve from oscillating wildly or provides unfaithful representation of the data point in question.
Smoothness: Blending functions dictate the overall smoothness of the resulting spline since the space of $ C^k (\mathcal{K})$ ( $k$-times continously differentiable functions defined on a closed and compact set $\mathcal{K}$ is a vector space over $\mathbb{R}$ or $\mathbb{C}$.
Symmetry: Blending functions that are symmetric about the central control point, do not change if the points are ordered in reverse. In this context, symmetry is assured, if and only if, $ \sum_{i=1}^n B_i(t) p_i = \sum_{i=1}^n B_i ((a+b)-t) p_{n-i} $ this holds if $B_i(t) = B_{n-i}((a+b)-t) $. For instance, Timmer’s parametric cubic, and Ball’s cubic – (a variant of cubic Bézier) – curve obey this property.

In principle, there can be as many properties imposed on the blending functions depending on the desired aspects one wants in their interpolating spline $ S(t) $.

Remark. The spline formulation (*) describes the weighted sum of the given control points $ \{p_i\}_{i=1}^n$. In other words, each control point is influencing the curve by pulling it in its direction, and the associated blending function is what determines the strength of this influence and pull. Sometimes, one does not need to use blending functions in trivial cases.

Brain-storming for new spline formulation: Finally, we were prepared for our main task. We brainstormed new spline formulations, in doing so, we first looked at different interpolating curves such as catenaries, parabolas, circles for interpolation and arc-length calculation, explored $C^1$ and $ C^2 $ continuity at control points, did the math on papers, which something I miss nowadays, for the 3-point, and then laid down the foundation for the $k$-point interpolation problem. I worked with the parabolas because I love them.

In parallel, I looked a bit into tangential question two … it is an interesting question:

Given $k$ points and a total arc-length $L$, can we interpolate these points with the given arc-length in $\mathbb{R}^2 $?

From the polynomial interpolation theorem, we know that for any set of $ k $ distinct points $(x_1, y_1), (x_2, y_2), \ldots, (x_k, y_k) \in \mathbb{R}^2 $, there exists a polynomial $ P(x) $ of degree $ k-1 $ such that: $P(x_i) = y_i \text{ for } i = 1, 2, \ldots, k.$. Such a polynomial is smooth and differentiable (i.e., it is a $ C^\infty $ function) over $ \mathbb{R} $ thus rectifiable so it possesses a well-defined finite arc-length over any closed interval.

Now, let us parameterize the polynomial $ P(x) $ as a parametric curve $ \mathbf{r}(t) = (t, P(t)) $, where $ t $ ranges over some interval $[a, b] \subset \mathbb{R}$.

Now let us compute its arc-length,

The arc-length $ S $ of the curve $ \mathbf{r}(t) = (t, P(t)) $ from $ t = a $ to $ t = b $ is given by:

$ S = \int_a^b \sqrt{1 + \left(\frac{dP(t)}{dt}\right)^2} \, dt. $

To achieve the desired total arc-length $ L $, we rescale the parameter $ t $. Define a new parameter $ \tau $ as: $ \tau = \alpha t. $

Now, the new arc-length ( S’ ) in terms of $ \tau $ is:

$ S’ = \int_{\alpha a}^{\alpha b} \sqrt{1 + \left(\frac{dP(\tau / \alpha)}{d(\tau / \alpha)}\right)^2} \frac{d(\tau / \alpha)}{d\tau} \, d\tau.$

Since $ \frac{d(\tau / \alpha)}{d\tau} = \frac{1}{\alpha} $, this simplifies to:

$ S’ = \int_{\alpha a}^{\alpha b} \sqrt{1 + \left(\frac{dP(\tau / \alpha)}{d(\tau / \alpha)}\right)^2} \frac{1}{\alpha} \, d\tau.$

$S’ = \frac{1}{\alpha} \int_{\alpha a}^{\alpha b} \sqrt{1 + \left(\frac{dP(\tau / \alpha)}{d(\tau / \alpha)}\right)^2} \, d\tau.$

$S’ = \frac{S}{\alpha}.$. To ensure $ S’ = L $, choose $ \alpha = \frac{S}{L} $.

Thereby, by appropriately scaling the parameter $ t $, we can adjust the arc-length to match $ L $. Thus, there exists a curve $ C $ that interpolates the $ k \geq 2 $ given points and has the total arc-length $ L $ in $ \mathbb{R}^2 $.

Now, what about implementation? how could we implement an algorithm to execute this task?

It is recommended to visualize your way through an algorithm on a paper first, then formalize to words and symbols, make sure there are no semantic errors in the formalization then code then debug. You know debugging is one of the most intellectually stimulating exercises, and exhausting ones. I am a MATLAB person so here is a MATLAB function you could use to achieve this task ..

function [curveX, curveY] = curveFunction(points, totalArcLength)
    % Input: 
    % points - Nx2 matrix where each row is a point [x, y]
    % totalArcLength - desired total arc length of the curve
    
    % Output:
    % curveX, curveY - vectors of x and y coordinates of the curve
    
    % Number of points
    n = size(points, 1);
    
    % Calculate distances between consecutive points
    distances = sqrt(sum(diff(points).^2, 2));
    
    % Calculate cumulative arc length
    cumulativeArcLength = [0; cumsum(distances)];
    
    % Normalize cumulative arc length to range from 0 to 1
    normalizedArcLength = cumulativeArcLength / cumulativeArcLength(end);
    
    % Desired number of points on the curve
    numCurvePoints = 100; % Change as needed
    
    % Interpolated arc length for the curve
    curveArcLength = linspace(0, 1, numCurvePoints);
    
    % Interpolated x and y coordinates
    curveX = interp1(normalizedArcLength, points(:, 1), curveArcLength, 'spline');
    curveY = interp1(normalizedArcLength, points(:, 2), curveArcLength, 'spline');
    
    % Scale the curve to the desired total arc length
    scale = totalArcLength / cumulativeArcLength(end);
    curveX = curveX*scale;
    curveY = curveY*scale;

    %plot(curveX, curveY);

 % Plot the curve
    figure;
    plot(curveX, curveY);
    hold on;
    title('Curve Interpolation using Arc-Length and Points');
    xlabel('X');
    ylabel('Y');
    grid on;
    hold off;
end

Key Takeaways, and Possible Research Directions:

Key Takeaway:

Splines are important!
Constructing them is nontrivial especially under multiple conflicting constraints, as it significantly narrows the feasible search space of potential representative functions.
Progress in abstract mathematics makes the lives of computational engineers and professionals in applied numerical fields easier, as it provides them with greater spaces for creativity and discoveries of new computational tools.

Possible Future Research Directions:

“I will approach this question as one approaches a

hippopotamus: stealthily and from the side.”

– R. Mahony

I borrowed this quote from Prof. Justin Solomon’s Ph.D. thesis. I read parts of it this morning and found that the quote perfectly captures my perspective on tackling the main question of this project. In this context, the “side” approach would involve exploring the question through the lens of functional analysis. 🙂

Acknowledgments. I would like to express my sincere gratitude to our project mentor Sofia Di Toro Wyetzner and Teaching Assistant Shanthika Naik for their continuous support, guidance, and insights to me and my project fellows during this interesting research journey, which prepared me well for my second project on “Differentiable Representations for 2D Curve Networks”. Moreover, I would like to thank my team fellows Charuka Bandara, Brittney Fahnestock, and Sachin Kishan for sharing interesting papers which I am planning to read after SGI!

Bibliography

Yu, Y., Li, X., & Ji, Y. (2024). On intersections of b-spline curves. Mathematics, 12(9), 1344. https://doi.org/10.3390/math12091344
Silva, L. and Gay Neto, A. (2023). Geometry reconstruction based on arc splines with application to wheel-rail contact simulation. Engineering Computations, 40(7/8), 1889-1920. https://doi.org/10.1108/ec-11-2022-0666
Yuksel, C. (2020). A class of c 2 interpolating splines. ACM Transactions on Graphics, 39(5), 1-14. https://doi.org/10.1145/3400301
Zorich, V. A. (2015). Mathematical Analysis I. Springer. https://doi.org/10.1007/978-3-662-48792-1
Shilov, G. E. (1996). Elementary Functional Analysis (2nd edition). Dover.
MacQueen, M. (1936). On the principal join of two curves on a surface. American Journal of Mathematics, 58(3), 620. https://doi.org/10.2307/2370980
Sederberg, T. (2012). Computer Aided Geometric Design. Faculty Publications. https://scholarsarchive.byu.edu/facpub/1

Project github repo: (link)

Geometric data refers to information that describes the shape, position, and properties of objects in space. It includes the following key components: curves, surfaces, meshes, volumes ..etc ↩︎
In this article, when we say curves, we usually refer to parametric curves. However, parametrized curves are not the same as curves in general ↩︎
The term “polygonal approximation” should not be taken too
literally; The term suggests that the Euclidean distance between two points $p$ and $q$ should be the “straight-line” distance between them. ↩︎

Post author By ehsan.ali
Post date August 8, 2024
Tags Math, Research, splines

Research

A Study on Surface Reconstruction from Signed Distance Data Part 1: SDFs

Post author By ar.bb
Post date August 6, 2024

Primary mentors: Silvia Sellán, University of Toronto and Oded Stein, University of Southern California

Volunteer assistant: Andrew Rodriguez

Fellows: Johan Azambou, Megan Grosse, Eleanor Wiesler, Anthony Hong, Artur, Sara

The definition of SDFs using ducks

Suppose that you took a piece of metal wire, twisted into some shape you like, let’s make it a flat duck, and then put it on a little pond. The splash of water made by said wire would propagate as a wave all across the pond, inside and outside the wire. If the speed of the wave were $1 m/s $, you’d have, at every instant, a wavefront telling you exactly how far away a given point is from the wire.

This is the principle behind a distance field, a scalar valued field $ d(\mathbf{x}) $ whose value at any given point $\mathbf{x}$ is given by $ d(\mathbf{x}, S) = \min_{\mathbf{y}}d(\mathbf{x}, \mathbf{y})$, where $ S $ is a simpled closed curve. To make this a signed distance field, we can distinguish between the interior and exterior of the shape by adding a minus sign in the interior region.

Suppose we were to define sdf without knowing the boundary $S$; given a region $ \Omega $ and a surface $ S $, whose inside is defined as the closure of $ \overline{\Omega} = A $. A function $ f : \mathbb{R}^n \rightarrow \mathbb{R} $ is said to be a signed distance function if for every point $ \mathbf{x} \in \Omega $ , we have eikonality $ | \nabla f | = 1 $ and closest point condition:
\[ f( \mathbf{x} – f(\mathbf{x}) \nabla f(\mathbf{x})) = 0 \]

The statement reads very simply: if we have a point $ \mathbf{x} $, and take the difference vector to the normal of the level set at hand times the distance, it should take us to the point $ \mathbf{x} $ closest to the zero level set.

This second definition is equivalent to the first one by an observation that eikonality implies $1$-Lipschitz property of the function $f$ and it is also used for the last step of the following derivation: $\exists q \in S:=f^{-1}(0)$
$$
p-f(p) \nabla f(p)=q \Longrightarrow p-q=f(p) \nabla f(p) \Longrightarrow|p-q|=|f(p)|
$$

This combined with $1$-Lipschitz property gives a proof of contradiction for characterization of sdf by eikonality and closest point condition.

Our project was focused on the SDF-to-Surface reconstruction methods, which is, given a SDF, how can we make a surface out of it? I’ll first tell you about the much simpler problem, which is Surface-to-SDF.

Our SDFs

To lay the foundation, we started with the 2D case: SDFs in the plane from which we can extract polylines. Before performing any tests, we needed a database of SDFs to work with. We created our SDFs using two different skillsets: art and math.

The first of our SDFs were hand-drawn, using an algorithm that determined the closest perpendicular distance to a drawn image at the number of points in a grid specified by the resolution. This grid of distance values was then converted into a visual representation of these distance values – a two-dimensional SDF. Here are some of our hand-drawn SDFs!

The second of our SDFs were drawn using math! Known as exact SDFs, we determined these distances by splitting the plane into regions with inequalities and many AND/OR statements. Each of these regions was then assigned an individual distance function corresponding to the geometry of that space. Here are some of our exact SDF functions and their extracted polylines!

Exact vs. Drawn

Exact Signed Distance Functions (SDFs) provide precise, mathematically rigorous measurements of the shortest distance from any point to the nearest surface of a shape, derived analytically for simple forms and through more complex calculations for intricate geometries. These are ideal for applications demanding high precision but can be computationally intensive. Conversely, drawn or approximate SDFs are numerically estimated and stored in discretized formats such as grids or textures, offering faster computation at the cost of exact accuracy. These are typically used in digital graphics and real-time applications where speed is crucial and minor inaccuracies in distance measurements are acceptable. The choice between them depends on the specific requirements of precision versus performance in the application.

Operations on SDFs

For two SDFs, $f_1(x)$ and $f_2(x)$, representing two shapes, we can define the following operations from constructive geometry:

Union
$$
f_{\cup}(x) = \min(f_1(x), f_2(x))
$$

Intersection
$$
f_{\cap}(x) = \max(f_1(x), f_2(x))
$$

Difference
$$
f_{\setminus}(x) = \max(f_1(x), -f_2(x))
$$

We used these operations to create the following new sdfs: a cat and a tessellation by hexagons.

Marching Squares Reconstructions with Multiple Resolutions

The marching_squares algorithm is a contouring method used to extract contour lines (isocontours) or curves from a two-dimensional scalar field (grid). It’s the 2D analogue of the 3D Marching Cubes algorithm, widely used for surface reconstruction in volumetric data. When applied to Signed Distance Functions (SDFs), marching_squares provides a way to visualize the zero level set (the exact boundary) of the function, which represents the shape defined by the SDF. Below are visualizations of various SDFs which we passed into marching squares with varied resolutions in order to test how well it reconstructs the original shape.

sdfs and reconstructions in three different resolutions.

SDFs in 3D Geometry Processing

While the work we have done has been 2D in our project, signed-distance functions can be applied to 3D geometry meshes in the same way. Below are various examples from recent papers that use signed distance functions in the 3D setting. Take for example marching cubes, a method for surface reconstruction in volumetric data developed by Lorensen and Cline in 1987 that was originally motivated by medical imaging. If we start with an implicit function, then the marching cubes algorithm works by “marching” over a uniform grid of cubes and then locating the surface of interest by seeing where the surface intersects with the cubes, and adding triangles with each iteration such that a triangle mesh can ultimately be formed via the union of each triangle. A helpful visualization of this algorithm can be found here.

Take for example our mentors Silvia Sellán and Oded Stein’s recent paper “Reach For the Spheres:Tangency-Aware Surface Reconstruction of SDFs” with C. Batty which introduces a new method for reconstructing an explicit triangle mesh surface corresponding to an SDF, an algorithm that uses tangency information to improve reconstructions, and they compare their improved 3D surface reconstructions to marching cubes and an algorithm called Neural Dual Contouring. We can see from their figure below that their SDF and tangency-aware and reconstruction algorithm is more accurate than marching cubes or NDCx.

(From Sellan et al, 2024) The Reach for the Spheres method that using tagency information for more accurate surface reconstructions can be seen on both 10^3 and 50^3 SDF grids as being more accurate than competing methods of marching cubes and NDCx. They used features of SDF and the idea of tangency constraints for this novel algorithm.

Another SGI mentor, Professor Keenan Crane, also works with signed distance functions. Below you can see in his recent paper “A Heat Method for Generalized Signed Distance” with Nicole Feng, they introduce a novel method for SDF approximation that completes a shape if the geometric structure of interest has a corrupted region like a hole.

(From Feng and Crane, 2024): In this figure, you can see that the inputted curves in magenta are broken, and the new method introduced by Feng and Crane works by computing signed distances from the broken curves, allowing them to ultimately generate an approximated SDF for the corrupted geometry where other methods fail to do this.

Clearly, SDFs are beautiful and extremely useful functions for both 2D and 3D geometry tasks. We hope this introduction to SDFs inspires you to study or apply these functions in your future geometry-related research.

References

Sellán, Silvia, Christopher Batty, and Oded Stein. “Reach For the Spheres: Tangency-aware surface reconstruction of SDFs.” SIGGRAPH Asia 2023 Conference Papers. 2023.

Feng, Nicole, and Keenan Crane. “A Heat Method for Generalized Signed Distance.” ACM Transactions on Graphics (TOG) 43.4 (2024): 1-19.

Marschner, Zoë, et al. “Constructive solid geometry on neural signed distance fields.” SIGGRAPH Asia 2023 Conference Papers. 2023.

Post author By ar.bb
Post date August 6, 2024

Research Tutorials

An Introduction to RXMesh

Post author By sachinkishan02
Post date August 2, 2024

This post serves as a tutorial for using RXMesh.

RXMesh is a framework for processing triangle mesh on the GPU. It allows developers to easily use the GPU’s massive parallelism to perform common geometry processing tasks.

This tutorial will cover the following topics:

Setting up RXMesh
RXMesh basic workflow
Writing kernels for queries
Example: Visualizing face normals

Setting up RXMesh

For the sake of this tutorial, you can set up a fork of RXMesh and then create your own small section to implement your own tutorial examples in the fork.

Go to https://github.com/owensgroup/RXMesh and fork the repository. You can create your own directory with a simple CUDA script to start using RXMesh. To do so, go to https://github.com/owensgroup/RXMesh/blob/main/apps/CMakeLists.txt in your fork and add a line:

add_subdirectory(introduction)

In the apps directory on your local fork, create a folder called “introduction”. There should be two files in this folder.

A .cu file to run the code. You can call this introduction.cu. To start, it can have the following contents:

#include "rxmesh/query.cuh"
#include "rxmesh/rxmesh_static.h"

#include "rxmesh/matrix/sparse_matrix.cuh"

using namespace rxmesh;

int main(int argc, char** argv)
{
    Log::init();

    const uint32_t device_id = 0;
    cuda_query(device_id);

    RXMeshStatic rx(STRINGIFY(INPUT_DIR) "sphere3.obj");

#if USE_POLYSCOPE
    polyscope::show();
#endif
}

2. A CMake file called CMakeLists.txt. It must have the following contents:

add_executable(introduction)

set(SOURCE_LIST
    introduction.cu	
)

target_sources(introduction
    PRIVATE
    ${SOURCE_LIST}
)

set_target_properties(introduction PROPERTIES FOLDER "apps")

set_property(TARGET ARAP PROPERTY CUDA_SEPARABLE_COMPILATION ON)

source_group(TREE ${CMAKE_CURRENT_LIST_DIR} PREFIX "introduction" FILES ${SOURCE_LIST})

target_link_libraries(introduction
    PRIVATE RXMesh
)

#gtest_discover_tests( introduction )

Once you have done the above steps, building the project with CMake is simple. Refer to the README in https://github.com/owensgroup/RXMesh to know the requirements and steps to build the project for your device.

Once the project has been set up, we can begin to test and run our introduction.cu file. If you were to run it, you should get an output of a 3D sphere.

This means the project set up has been successful, we can now begin learning RXMesh.

RXMesh basic workflow

RXMesh provides custom data structures and functions to handle various low-level computational tasks that would otherwise require low-level implementation. Let’s explore the fundamentals of how this workflow operates.

To start, look at this piece of code.

int main(int argc, char** argv)
{
    Log::init();
    const uint32_t device_id = 0;
    rxmesh::cuda_query(device_id);

    rxmesh::RXMeshStatic rx(STRINGIFY(INPUT_DIR) "dragon.obj");
    auto polyscope_mesh = rx.get_polyscope_mesh();

    auto vertex_pos   = *rx.get_input_vertex_coordinates();
    auto vertex_color = *rx.add_vertex_attribute<float>("vColor", 3);

    rx.for_each_vertex(rxmesh::DEVICE,
    [vertex_color, vertex_pos] __device__(const rxmesh::VertexHandle vh)
    {
            vertex_color(vh, 0) = 0.9;
            vertex_color(vh, 1) = vertex_pos(vh, 1);
            vertex_color(vh, 2) = 0.9;
    });

    vertex_color.move(rxmesh::DEVICE, rxmesh::HOST);

    polyscope_mesh->addVertexColorQuantity("vColor", vertex_color);
    polyscope::show();

    return 0;
}

The above is a slightly simplified version of what a for_each computation program would look like using RXMesh. for_each involves accessing every mesh element of a specific type (vertex, edge, or face) and performing some process on/with it.

In the code example above, the computation adjusts the green component of the vertex’s color based on the vertex’s Y coordinate in 3D space.

But how do we comprehend this code? We will look at it line by line:

    rxmesh::RXMeshStatic rx(STRINGIFY(INPUT_DIR) "dragon.obj");

The above line declares the RXMeshStatic object. Static here stands for a static mesh (one whose connectivity remains constant for the duration of the program’s runtime). Using the RXMesh’s Input directory, we can use some meshes that come with it. In this case, we pass in the dragon.obj file, which holds the mesh data for a dragon.

    auto polyscope_mesh = rx.get_polyscope_mesh();

This line returns the data needed for Polyscope to display the mesh along with any attribute content we add to it for visualization.

    auto vertex_pos = *rx.get_input_vertex_coordinates();

This line is a special function that directly gives us the coordinates of the input mesh.

auto vertex_color = *rx.add_vertex_attribute<float>("vColor", 3);

This line gives vertex attributes called “vColor”. Attributes are simply data that lives on top of vertices, edges, or faces. To learn more about how to handle attributes in RXMesh, check out this. In this case, we associate three float-point numbers to each vertex in the mesh.

rx.for_each_vertex(rxmesh::DEVICE,
    [vertex_color, vertex_pos] __device__(const rxmesh::VertexHandle vh)
    {
            vertex_color(vh, 0) = 0.9;
            vertex_color(vh, 1) = vertex_pos(vh, 1);
            vertex_color(vh, 2) = 0.9;
    });

The above lines represent a lambda function that is utilized by the for_each computation. In this case, for_each_vertex accesses each of the vertices of the mesh associated with our dragon. We pass in the arguments:

rxmesh::DEVICE – to let RXMesh know this is happening on the device i.e., the GPU.
[vertex_color, vertex_pos] – represents the data we are passing to the function.
(const rxmesh::VertexHandle vh) – is the handle that is used in any RXMesh computation. A handle allows us to access data associated to individual geometric elements. In this case, we have a handle that allows us to access each vertex in the mesh.

What do these lines mean?

            vertex_color(vh, 0) = 0.9;
            vertex_color(vh, 1) = vertex_pos(vh, 1);
            vertex_color(vh, 2) = 0.9;

These 3 lines bring together a lot of the declarations made earlier. Through the kernel, we are accessing the attribute data we defined before. Since we need to know which vertex we are accessing its color, we pass in vh as the first argument. Since a vertex has three components (standing for RGB), we also need to pass which index in the attribute’s vector (you can think of it as a 1D array too) we are accessing. Hence vertex_color(vh, 0) = 0.9; which stands for “in the vertex color associated with the handle vh (which for the kernel represents a specific vertex on the mesh), the value of the first component is 0.9”. Note that this “first component” represents red for us.

What about vertex_color(vh, 1) = vertex_pos(vh, 1)? This line, similar to the previous one, is accessing the second component associated with the color, in the vertex the handle is associated with.

But what is on the right-hand side? We are accessing vertex_pos (our coordinates of each vertex in the mesh) and we are accessing it the same way we access our color. In this case, the line is telling us that we are accessing the 2nd positional coordinate (y coordinate) associated with our vertex (that our handle gives to the kernel).

    vertex_color.move(rxmesh::DEVICE, rxmesh::HOST);

This line moves the attribute data from the GPU to the CPU.

polyscope_mesh->addVertexColorQuantity("vColor", vertex_color);

This line uses Polyscope to add a “quantity” to Polyscope’s visualization of the mesh. In this case, when we add the VertexColorQuantity and pass vertex_color, Polyscope will now visualize the per-vertex color information we calculated in the lambda function.

    polyscope::show();

We finally render the entire mesh with our added quantities using the show() function from Polyscope.

Throughout this basic workflow, you may have noticed it works similarly to any other program where we receive some input data, perform some processing, and then render/output it in some form.

More specifically, we use RXMesh to read that data, set up our attributes, and then pass that into a kernel to perform some processing. Once we move that data from the device to the host, we can either perform further processing or render it using Polyscope.

It is important to look at the kernel as computing per geometric element. This means we only need to think of computation on a single mesh element of a certain type (i.e., vertex, edge, or face) since RXMesh then takes this computation and runs it in parallel on all mesh elements of that type.

Writing kernels for queries

While our basic workflow covers how to perform for_each operation using the GPU, we may often require geometric connectivity information for different geometry processing tasks. To do that, RXMesh implements various query operations. To understand the different types of queries, check out this part of the README.

We can say there are two parts to run a query.

The first part consists of creating the launchbox for the query, which defines the threads and (shared) memory allocated for the process along with calling the function from the host to run on the device.

The second part consists of actually defining what the kernel looks like.

We will look at these one by one.

Here’s an example of how a launchbox is created and used for a process where we want to find all the vertex normals:

// Vertex Normal
auto vertex_normals = rx.add_vertex_attribute("vNormals", 3);
constexpr uint32_t CUDABlockSize = 256;

rxmesh::LaunchBox<CUDABlockSize> launch_box;

rx.prepare_launch_box(
  {rxmesh::Op::FV},
  launch_box,
  (void*)compute_vertex_normal<float,CUDABlockSize>);

compute_vertex_normal<float, CUDABlockSize>
<<<launch_box.blocks, 
   launch_box.num_threads, 
   launch_box.smem_bytes_dyn>>>
(rx.get_context(), vertex_pos, *vertex_normals);

vertex_normals->move(rxmesh::DEVICE, rxmesh::HOST);

Notice how, in replacing the for_each part of our basic workflow, we instead declare the launchbox and call our function compute_vertex_normal (which we will look at next) from our main function.

We must also define the kernel which will run on the device.

template <typename T, uint32_t blockThreads>
__global__ static void compute_vertex_normal(const rxmesh::Context      context,
rxmesh::VertexAttribute<T> coords,
rxmesh::VertexAttribute<T> normals)
{


    auto vn_lambda = [&](FaceHandle face_id, VertexIterator& fv){
        // get the face's three vertices coordinates
        glm::fvec3 c0(coords(fv[0], 0), coords(fv[0], 1), coords(fv[0], 2));
        glm::fvec3 c1(coords(fv[1], 0), coords(fv[1], 1), coords(fv[1], 2));
        glm::fvec3 c2(coords(fv[2], 0), coords(fv[2], 1), coords(fv[2], 2));

        // compute the face normal
        glm::fvec3 n = cross(c1 - c0, c2 - c0);

        // the three edges length
        glm::fvec3 l(glm::distance2(c0, c1),
                     glm::distance2(c1, c2),
                     glm::distance2(c2, c0));

        // add the face's normal to its vertices
        for (uint32_t v = 0; v < 3; ++v) {      
// for every vertex in this face
            for (uint32_t i = 0; i < 3; ++i) {  
// for the vertex 3 coordinates
                atomicAdd(&normals(fv[v], i), n[i] / (l[v] + l[(v + 2) % 3]));
            }
        }
    };

    auto block = cooperative_groups::this_thread_block();

    Query<blockThreads> query(context);
    ShmemAllocator      shrd_alloc;
    query.dispatch<Op::FV>(block, shrd_alloc, vn_lambda);
}

A few things to note about our kernel

We pass in a few arguments to our kernel.
- We pass the context which allows RXMesh to access the data structures on the device.
- We pass in the coordinates of each vertex as a VertexAttribute
- We pass in the normals of each vertex as an attribute. Here, we accumulate the face’s normal on its three vertices.
The function that performs the processing is a lambda function. It takes in the handles and iterators as arguments. The type of the argument will depend on what type of query is used. In this case, since it is an FV query, we have access to the current face(face_id) the thread is acting on as well as the vertices of that face using fv.
Notice how within our lambda function, we do the same as before with our for_each operation, i.e., accessing the data we need using the attribute handle and processing it for some output for our attribute.
Outside our lambda function, notice how we need to set some things up with regards to memory and the type of query. After that, we call the lambda function to make it run using query.dispatch

Visualizing face normals

Now that we’ve learnt all the pieces required to do some interesting calculations using RXMesh, let’s try a new one out.

Try to visualize the face normals of a given mesh. This would mean obtaining an output like this for the dragon mesh given in the repository:

Here’s how to calculate the face normals of a mesh:

Take a vertex. Calculate two vectors that are formed by subtracting the selected vertex from the other two vertices.
Take the cross product of the two vectors.
The vector obtained from the cross product is the normal of the face.

Now try to perform the calculation above. We can do this for each face using the vertices it is connected to.

All the information above should give you the ability to implement this yourself. If required, the solution is given below.

The kernel that runs on the device:

template <typename T, uint32_t blockThreads>
global static void compute_face_normal(
const rxmesh::Context context,
rxmesh::VertexAttribute coords, // input
rxmesh::FaceAttribute normals) // output
{
auto vn_lambda = [&](FaceHandle face_id, VertexIterator& fv) {
    // get the face's three vertices coordinates

    glm::fvec3 c0(coords(fv[0], 0), coords(fv[0], 1), coords(fv[0], 2));
    glm::fvec3 c1(coords(fv[1], 0), coords(fv[1], 1), coords(fv[1], 2));
    glm::fvec3 c2(coords(fv[2], 0), coords(fv[2], 1), coords(fv[2], 2));

    // compute the face normal
    glm::fvec3 n = cross(c1 - c0, c2 - c0);
    normals(face_id, 0) = n[0];
    normals(face_id, 1) = n[1];
    normals(face_id, 2) = n[2];
};

auto block = cooperative_groups::this_thread_block();
Query<blockThreads> query(context);
ShmemAllocator shrd_alloc;
query.dispatch<Op::FV>(block, shrd_alloc, vn_lambda);

}

The function call from main on HOST:

int main(int argc, char** argv)
{
    Log::init();

    const uint32_t device_id = 0;
    cuda_query(device_id);

    rxmesh::RXMeshStatic rx(STRINGIFY(INPUT_DIR) "sphere3.obj");

    auto vertex_pos = *rx.get_input_vertex_coordinates();
    auto face_normals = rx.add_face_attribute<float>("fNorm", 3);

    constexpr uint32_t CUDABlockSize = 256;
    LaunchBox<CUDABlockSize> launch_box;
  
    compute_face_normal<float, CUDABlockSize>
    <<<launch_box.blocks,
    launch_box.num_threads,
    launch_box.smem_bytes_dyn>>>
    (rx.get_context(), vertex_pos, *face_normals);
    
    face_normals->move(DEVICE, HOST);

    rx.get_polyscope_mesh()->addFaceVectorQuantity("fNorm", *face_normals);

#if USE_POLYSCOPE
    polyscope::show();
#endif
}

As an extra example, try the following:

Color a vertex depending on how many vertices it is connected to.
Test the above on different meshes and see the results. You can access different kinds of mesh files from the “Input” folder in the RXMesh repo and add your own meshes if you’d like.

By the end of this, you should be able to do the following:

Know how to create new subdirectories in the RXMesh Fork
Know how to create your own for_each computations on the mesh
Know how to create your own queries
Be able to perform some basic geometric analyses’ using RXMesh queries

References

More information can be found in RXMesh GitHub Repository.

This blog was written by Sachin Kishan during the SGI 2024 Fellowship as one of the outcomes of a two week project under the mentorship of Ahmed Mahmoud and support of Supriya Gadi Patil as teaching assistant.

Post author By sachinkishan02
Post date August 2, 2024
Tags GPU, RXMesh, SGI, sgi2024

Research

Geometry Processing from a Mathematics Education Major’s Eyes- The First Three Weeks

Post author By bfahnest
Post date July 30, 2024

Hi! I am Brittney Fahnestock, and I am a rising senior at the State University of New York at Oswego. This is one of the smaller, less STEM-based state schools in New York. That being said, I am a mathematics education and mathematics double major. Most of my curriculum has been pedagogy-based, teaching mathematics and pure mathematics topics like algebraic topology.

So, as you might have read, that does not seem to have applied mathematics, coding, or geometry processing experience. If you are thinking that, you are right! I have minimal experience with Python, PyTorch, and Java. All of my experience is either based on statistical data modeling or drawing cute pictures in Java. So again, my experience has not been much help with what was about to take place!

The introduction week was amazing, but hard. It helped me catch up information-wise since I had minimal experience with geometry processing. However, the exercises were time-consuming and challenging. While I felt like I understood the math during the lectures, coding up that math was another challenge. The exercise helped to make all of the concepts that were lectured about more concrete. While I did not get them all working perfectly, the reward was in trying and thinking about how I could get them to work or how to improve my code.

Now, onto what my experience was like during the research! In the first week, we worked with Python. I did a lot more of the coding than expected. I felt lost many times with what to do, but in our group, we were trying to get out of our comfort zones. Getting out of our comfort zones allows us to grow as researchers and as humans. As a way to make sure we were productive, we did partner coding. My partner knew a lot more about coding than I did. This was a great way to learn more about the topic with guidance!

We worked with splines, which of course was new for me, but they were interesting. Splines not only taught me about coding and geometry but also the continuity of curves and surfaces. The continuity was my favorite part. We watched a video explaining how important it is for there to be strong levels of continuity for video games, especially when the surfaces are reflective.

Week two was a complete change of pace. The language used was C++, but no one in the group had any experience with C++. Our mentor decided that we would guide him with the math and coding, but he would worry about the syntax specifics of C++. I guess this was another form of partner/group coding. While it exposed us to C++, our week was aimed at helping us understand the topics and allowing us to explore what we found interesting.

We worked with triangle marching and extracting iso-values with iso-segmentation and simplices. I had worked with simplicies in an algebraic topology context and found it super interesting to see the overlap in the two fields.

I have an interest in photography, so the whole week I saw strong connections to a passion of mine. This helped me to think of new questions and research ideas. While a week was not quite long enough to accomplish our goal, we plan on meeting after SGI to continue our work!

I can not wait for the last three weeks of SGI so I can continue to learn new ideas about geometry processing and research!

Post author By bfahnest
Post date July 30, 2024
Tags iso-segmentation, sgi2024, splines

Math Research

A Deeper Understanding OpenAI’s CLIP Model

Post author By kris.chebo
Post date July 28, 2024

Author: Krishna Chebolu
Teammates: Bethlehem Tassew and Kimberly Herrera
Mentor: Dr. Ankita Shukla

Introduction

For the past two weeks, our project team mentored by Dr. Ankita Shukla set out to understand the inner workings of OpenAI’s CLIP model. Specifically, we were interested in gaining a mathematical understanding of feature spaces’ geometric and topological properties.

OpenAI’s CLIP (Contrastive Language-Image Pre-Training) is a versatile and powerful model designed to understand and generate text and images. CLIP is trained to connect text and images by learning from a large dataset of images paired with their corresponding textual descriptions. The model is trained using a contrastive learning approach, where it learns to predict which text snippet is associated with which image from a set of possible pairs. This allows CLIP to understand the relationship between textual and visual information.

Figure 1: OpenAI’s CLIP architecture as it appears in the paper. CLIP pre-trains an image encoder and a text encoder to predict which images were paired with which texts in our dataset. Then OpenAI uses this behavior to turn CLIP into a zero-shot classifier. We convert all of a dataset’s classes into captions such as “a photo of a dog” and predict the class of the caption CLIP estimates best pairs with a given image.

CLIP uses two separate encoders: a text encoder (based on the Transformer architecture) and an image encoder (based on a convolutional neural network or a vision transformer). Both encoders produce embeddings in a shared latent space (also called a feature space). By aligning text and image embeddings in the same space, CLIP can perform tasks that require cross-modal understanding, such as image captioning, image classification with natural language labels, and more.

CLIP is trained on a vast dataset containing 400 million image-text pairs collected online. This extensive training data allows it to generalize across various domains and tasks. One of CLIP’s standout features is its ability to perform zero-shot learning. It can handle new tasks without requiring task-specific training data, simply by understanding the task description in natural language. More information can be found in OpenAI’s paper.

In our attempts to understand the inner workings of the feature spaces, we employed tools from UMAP, persistence homology, subspace angles, cosine similarity matrices, and Wasserstein distances.

Our Study – Methodology and Results

All of our teammates started with datasets that contained image-caption pairs. We classified images into various categories using their captions and embedded them using CLIP. Then we used UMAP or t-SNE plots to visualize their characteristics.

Figure 2: A screenshot of a UMAP of 1000 images from the Flickr 8k dataset from Kaggle divided into five categories (animal, human, sport, nature, and vehicle) is shown. Here we can also observe that the images (colored) are embedded differently than their corresponding captions (gray). Although not shown here, the captions are also clustered around categories.

After this preliminary visualization, we desire to delve deeper. We introduced noise, a Gaussian blur, to our images to test CLIP’s robustness. We added the noise in increments (for example mean = 0, standard deviation = {1,2,3,4,5}) and encoded them as we did the original image-caption pairs. We then made persistence diagrams using ripser. We also followed the same procedure within the various categories to understand how noise impacts not only the overall space but also their respective subspaces. These diagrams for the five categories from the Flickr 8k dataset can be found in this Google Colab notebook.

Figure 3: The same 1000 images from the Flickr 8k dataset with increasing noise are seen above. Visually, no significant difference can be observed. The standard deviation of the Gaussian blur increases from left to right.

Visually, you can observe that there is no significant difference, which attests to CLIP’s robustness. However, visual assurance is not enough. Thus, we used Scipy’s Wasserstein’s distance calculation to note how different each persistence diagram is from the other. Continuing the same Flickr 8k dataset, for each category, we obtain the values shown in Figure 4.

Figure 4: Wasserstein distances in each category. You can see the distance between original images with respect to images with Gaussian blur of std. dev = 1 is not high compared to std. dev = 2 or above. This implies that the original set is not as different from the set of images blurred with std. dev. = 1 as compared to std. dev. = 2 which in turn is not as different as the set of blurred images with std. dev. = 3, and so on. This property holds for all five categories.

Another question to understand is how similar are each of the categories to one another. This question can be answered by calculating the subspace angles. After embedding, each category can be seen as occupying a space that can often be far away from another category’s space– we want to quantify how far away, so we use subspace angles. Results for the Flickr 8k dataset example are shown in Figure 5.

Figure 5: Subspace angles of each category pair in the five categories introduced earlier in the post. All values are displayed in degrees. You can observe that the angle between human- and animal-category images is ~0.92° compared to human and nature: ~2.3°; which makes sense as humans and animals are more similar than humans compared to nature. It is worthwhile to note that the five categories are simplifying the dataset too much as they do not capture the nuances of the captions. More categories or descriptions of categories would lead to higher accuracy in the quantity of the subspace angle.

Conclusion

At the start, our team members were novices in the CLIP model, but we concluded as lesser novices. Through the two weeks, Dr. Shukla supported us and enabled us to understand the inner workings of the CLIP model. It is certainly thrilling to observe how AI around us is constantly evolving, but at the heart of it is mathematics governing the change. We are excited to possibly explore further and perform more analyses.

Post author By kris.chebo
Post date July 28, 2024

Research

Graph-based Optimal Transport for Keypoint Matching: Extended NOCs

Post author By nicolaspigadas14
Post date July 28, 2024

Image 1: Our proposed solution’s pipeline, credits to our -always supportive- mentors, Saleh Mahdi and Dani Velikova!

Introduction

Our project aims to enhance Optimal Transport (OT) solvers by incorporating Graph Neural Networks (GNNs) to address the lack of geometric consistency in feature matching. The main motivation of our study is that a lot of real objects are symmetric and thus impose ambiguity. Traditional OT approaches focus on similarity measures and often neglect important neighboring information in geometric settings, proper in computer vision or graphics problems, resulting in the production of huge noise in pose estimation tasks. To tackle this problem, we hypothetize that when the object is symmetric, there are many correct matches of points around its symmetric axis and we can leverage this fact via Optimal Transport and Graph Learning.

To this end, we propose the pipeline in image 1. On a high level, we extract a 3D point cloud of an object from a 2D scene using the Normalized Object Coordinate Space (NOCs) pipeline [1], downsample the points and pool the features/NOCs embeddings of the removed points into the representative remaining points. Concurrently, we run DGCNN on the 3D ground truth mesh of the object and create a graph; this model is used as a 3D encoder [2]. We combine the aforementioned embeddings into the cost matrix used by the Sinkhorn algorithm [3], which attempts to match the points of the point cloud produced by the NOCs map with the nodes of the graph produced by the DGCNN and then to provide a symmetry estimation. We extend the differentiable Sinkhorn solver to yield one-to-many solutions; a modification we make to fully capture the symmetry of a texture-less object. In the final step, we match pairs of points and pixels and perform symmetric pose estimation via Perspective-n-Point. In this post, we study one part of our proposed pipeline, the extended NOCs, from the 2D image analysis to the extraction of the 3D point set and the pooled feature set.

Extended NOCs pipeline

Image 2: The part of the pipeline we will study in this post: Generating a 3D point cloud of a shape from a chosen 2D scene and downsampling its points. We pool the features of the removed points into remaining (seed) points.

The NOCs pipeline extracts the NOCs map and provides size and pose estimation for each object in the scene. The NOCs map encodes via colours the prediction of the normalized 3D coordinates of the target objects’ surfaces within a bounded unit cube. We extend the NOCs pipeline to create a 3D point cloud of the target object using the extracted NOCs map. We then downsample the point cloud, by fragmenting it using the Farthest Point Sampling (FPS) algorithm and only keeping the resulting seed points. In order to fully encapsulate the information extracted by the NOCs pipeline, we pool the embeddings of the nodes of each fragment into the seed points, by averaging them. This enables our pipeline to leverage the features of the target object’s 2D partial view in its scene.

Data

Images 3, 4, 5: (3) The scene of the selected symmetric and texture-less object. (4) The object in the scene zoomed in. (5) The ground truth of the object, i.e. its exact 3D model.

In order to test our hypothesis, we use an object that is not only symmetric, but also texture-less and pattern-free. The specifications of the object we use are the following:

Dataset: camera, val data
Scene: bundle 00009, scene 0001
Object ids: 02880940, fc77ad0828db2caa533e44d90297dd6e
Link to download: https://github.com/sahithchada/NOCS_PyTorch?tab=readme-ov-file

Extended NOCs pipeline: steps visualised

In this section we visually examine each part of the pipeline and how it processes the target shape. First, we get a 2D image of a 3D scene, which is the input of the NOCs pipeline:

Image 6: The 3D scene to be analyzed.

Running the NOCs pipeline on it, we get the pose and size estimation, as well as the NOCs map for every object in the scene:

Image 7, 8: (7) Pose and size estimation of the objects in the scene. (8) NOCs maps of the objects.

The color encodings in the NOCs map encode the prediction of the normalized 3D coordinates of the target objects’ surfaces, within a bounded a unit cube. We leverage this prediction to reconstruct the 3D point cloud of our target shape, from its 2D representation in the scene. We used a statistical outlier removal method to refine the point cloud and remove noisy points. We got the following result:

Images 9, 10: (9) The 2D NOCs map of the object. (10) The 3D point cloud reconstruction of the 2D NOCs map of the target object.

The extracted 3D point cloud of the target object is then fragmented using the Farthest Point Sampling (FPS) algorithm and the seed (center/representative) points of each fragment are also calculated.

Image 11: The fragmented point cloud of the target object. Each fragment contains its corresponding seed point.

The Sinkhorn algorithm part of our project requires a lot less data points for an optimized performance. Thus, we downsample the 3D point cloud, by only keeping the seed points. In order to capture the information extracted by the NOCs pipeline, we pool the embeddings of the removed features into their corresponding seed points via averaging:

Image 12: The 3D seed point cloud. Into each seed point, we have pooled the embeddings of the removed points of their corresponding fragment.

An overview of each step and the visualised result can be seen below:

Image 13: The scene and object’s visualisation across every step of the pipeline.

Conclusion

In this study, we discussed the information extraction of a 3D object from a 2D scene. In our case, we examined the case of a symmetric object. We downsampled the resulting 3D point cloud in order for it to be effectively handled by later stages of the pipeline, but we made share to encapsulate the features of the removed points via pooling. It will be very interesting to see how every part of the pipeline comes together to predict symmetry!

At this point, I would like to thank our mentors Mahdi Saleh and Dani Velikova, as well as Matheus Araujo for their continuous support and understanding! I started as a complete beginner to these computer vision tasks, but I feel much more confident and intrigued by this domain!

Implementation

repo: https://github.com/NIcolasp14/MIT_SGI-Extended_NOCs/tree/main

References

[1] He Wang, Srinath Sridhar, Jingwei Huang, Julien Valentin, Shuran Song, and Leonidas J. Guibas. Normalized object coordinate space for category-level 6d object pose and size estimation, 2019.
[2] Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic graph cnn for learning on point clouds, 2019.
[3] Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. CoRR, abs/1911.11763, 2019.

Post author By nicolaspigadas14
Post date July 28, 2024
Tags Graph Neural Networks, Optimal Transport, Sinkhorn Algorithm, Symmetry

Research

How to match the “wiggliness” of two shapes

Post author By ar.bb
Post date July 22, 2024

If you have an object which you would like to represent as a mesh, what measure do you use to make sure they’re matched nicely?

Suppose you had the ideal, perfectly round implicit sphere $ || \mathbf{x}|| = c$, and you, as a clever geometry processor, wanted to make a discrete version of said sphere employing a polygonal mesh $ \mathbf{M(V,F)}$. The ideal scenario is that by finely tessellating this mesh into a limit of infinite polygons, you’d be able to approach the perfect sphere. However, due to pesky finite memory, you only have so many polygons you can actually add.

The question then becomes, how do you measure how well the perfect and tessellated spheres match? Further, how would you do it for any shape $ f(\mathbf{x}) $? Is there also a procedure not just for perfect shapes, but also to compare two different meshes? That was the objective of Nicholas Sharp’s research program this week.

The 1D Case

Suppose we had a curve $ A(t):[0,1] \rightarrow \mathbb{R}^n $, and we want to build an approximate polyline curve
$ B(t) $. We want to make our discrete points have the nearest look to our original curve $ A(t) $.

As such, we must choose an error metric measuring how similar the two curves are, and make an algorithm that produces a curve that minimizes this metric. A basic one is simply to measure the total area separating the approximate and real curve, and attempt to minimize it.

The error metric for two curves is the area between them

The distance between the polyline and the curve $ |A(t) – B(t)| $ is what allows us to produce this error functional, that tells us how much or how little they match. That’s all there is to it. But how do we compute such a distance?

Well, one way is the Chamfer Distance of the two sets of points $ ch(A,B) $. It’s defined as the nearest point distance among the two curves, and all you need to figure it out is to draw little circles along one of the curves. Where the circles touch defines their nearest point!

Here $ d(A,B) $ is just the regular Euclidean distance. Now we would only need to minimize a error functional proportional to \[ \partial E(A,B) = \int_{\gamma} ch(A,B) \sim \sum_{i=0}^n dL \ ch(A, B) = 0 \]

Well simple enough, but there’s a couple of problems. The first one is most evident: a “nearest point query” between curves does not actually contain much geometrical information. A simple counterexample is a spikey curve near a relatively flat curve; the nearest point to a large segment of the curve is a single peak!

We’ll see another soon enough, but this immediately brings into question the validity of an attempted approximation using $ ch(A,B) $. Thankfully mathematicians spent a long time idly making up stuff, so we actually have a wide assortment of possible distance metrics to use!

Different error metrics

The first alternative metric is the classic Hausdorff Distance, defined as upper bound of the Chamfer distance. Along a curve, always picking the smallest distances, we can select the largest such distance.

Where we to use such a value, the integral error would be akin to the mean error value, rather than the absolute total area. This does not fix our original shape problem, but it does facilitate computation!

The last contender for our curves is the Fréchet Distance. It is relatively simple: you parametrize two curves $ A $ and $ B $ by some parameter $ t $ such that $A(\alpha(t)) $ and $ B(\beta(t))$, you can picture it as time; it is always increasing. What we do is say that Fréchet distance $ Fr(A,B) $ is the instantaneous distance between two points “sliding along” the two curves:

We can write this as \[ Fr(A, B) = \inf_{\alpha, \beta} \max_{t\in[0,1]} d(A(\alpha(t)), B(\beta(t))) \]

This provides many benefits at the cost of computation time. The first is that it immediately corrects points from one of our curves “sliding back” as in the peak case, since it must be a completely injective function in the sense one point at some parameter $ t $ must be unique.

The second is that it encodes shape data. The shape information is contained in the relative distribution of the parametrization; for example, if we were to choose $ \alpha(t) \approx \ln t $, we’d produce more “control points” at the tail end of the curve and that’d encode information about its derivatives. The implication being that this information, if associated to the continuity $ (C^n) $ of the curve, could imply details of its curvature and such.

Here is a simple example. Suppose we had two very squiggly curves criss-crossing each other:

Now, were we to choose a Chamfer/Hausdorff distance variant, we would actually get a result indicating a fairly low error, because, due to the criss-cross, we always have a small distance to the nearest point:

However, were we to introduce a parametrization along the two curves that is roughly linear, that is, equidistant along arclenghts of the curve, we could verify that their Fréchet distance is very large, since “same moment in time” points are widely distanced:

Great!

The 2D case

In our experiments with Nicholas, we only got as far as generalizing this method to disk topology, open grid meshes, and using the Chamfer approach to them. We first have a flat grid mesh, and apply a warp function $ \phi(\mathbf{x}) $ that distorts it to some “target” mesh.

def warp_function(uv):

    # input: array of shape (n, 2) uv coordinates. (n,3) input will just ignore the z coordinate

    ret = []
    for i in uv:
        ret.append([i[0], i[1], i[0]*i[1]])

    return np.array(ret)

We used a couple of test mappings, such as a saddle and a plane with a logarithmic singularity.

We then do a Chamfer query to evaluate the absolute error by summing all the distances from the parametric surface to the grid mesh. The 3D Chamfer metric is simply the nearest sphere query to a point in the mesh:

def chamfer_distance(A, B):
    ret = 0
    closest_points = []

    for a in range(A.shape[0]):
        min_distance = float('inf')
        closest_point = None
        for b in range(B.shape[0]):
            distance = np.linalg.norm(A[a] - B[b])
            if distance < min_distance:
                min_distance = distance
                closest_point = b
        ret += min_distance
        closest_points.append((a, A.shape[0]+closest_point))
    
    closest_points_vertices = np.vstack([A, B])
    closest_points_indices = np.array(closest_points)

    assert closest_points_indices.shape[0] == A.shape[0] 
and closest_points_indices.shape[1] == 2
    assert closest_points_vertices.shape[0] == A.shape[0] + B.shape[0] and closest_points_vertices.shape[1] == A.shape[1]

    return ret, closest_points_vertices, closest_points_indices

Our main goals were to check if, as the base grid mesh was refined, the error diminished as the approximation increased in resolution, and to see if the relative angle of the grid mesh to the test function significantly altered the error.

The first experiment was validated quite successfully; besides perhaps very pathological cases, the error diminished as we increased the resolution:

And the second one as well, but this will be discussed in a separate blog post.

Application Areas: Chamfer Loss in Action

In machine learning, a loss function is used to measure the difference (i.e. error) between the given input and the neural network’s output. By following this measure, it is possible to have an estimate of how well the neural network models the training data and optimize its weights with respect to this.

For the task of mesh reconstruction, e.g. reconstructing a surface mesh given an input representation such as a point cloud, the loss function aims to measure the difference between the ground truth and predicted mesh. To achieve this, the Chamfer loss (based on the Chamfer distance discussed above) is utilized.

Below, we provide a code snippet for the implementation of Chamfer distance from a widely adopted deep learning library for 3D data, PyTorch3D [1]. The distance can be modified to be used as single or bi-directional (single_directional) and by adopting additional weights.

def chamfer_distance(
    x,
    y,
    x_lengths=None,
    y_lengths=None,
    x_normals=None,
    y_normals=None,
    weights=None,
    batch_reduction: Union[str, None] = "mean",
    point_reduction: Union[str, None] = "mean",
    norm: int = 2,
    single_directional: bool = False,
    abs_cosine: bool = True,

):

    if not ((norm == 1) or (norm == 2)):

        raise ValueError("Support for 1 or 2 norm.")

    x, x_lengths, x_normals = _handle_pointcloud_input(x, x_lengths, x_normals)

    y, y_lengths, y_normals = _handle_pointcloud_input(y, y_lengths, y_normals)

    cham_x, cham_norm_x = _chamfer_distance_single_direction(
        x,
        y,
        x_lengths,
        y_lengths,
        x_normals,
        y_normals,
        weights,
        batch_reduction,
        point_reduction,
        norm,
        abs_cosine,
    )
    if single_directional:
        return cham_x, cham_norm_x

    else:
        cham_y, cham_norm_y = _chamfer_distance_single_direction(

            y,
            x,
            y_lengths,
            x_lengths,
            y_normals,
            x_normals,
            weights,
            batch_reduction,
            point_reduction,
            norm,
            abs_cosine,
        )

        if point_reduction is not None:
            return (
                cham_x + cham_y,
                (cham_norm_x + cham_norm_y) if cham_norm_x is not None else None,
            )
        return (
            (cham_x, cham_y),
            (cham_norm_x, cham_norm_y) if cham_norm_x is not None else None,

        )

When Chamfer is not enough

Depending on the task, the Chamfer loss might not be the best measure to estimate the difference between the surface and reconstructed mesh. This is because in certain cases, the $ k $-Nearest Neighbor (KNN) step will always be taking the difference between the input point $ \mathbf{y} $ and the closest point $ \hat{\mathbf{y}}$ on the reconstructed mesh, leaving aside the actual point where the measure should be taken into account. This could lead a gap between the input point and the reconstructed mesh to stay untouched during the optimization process. Research for machine learning in geometry proposes to adopt different loss functions to mitigate this problem introduce by the Chamfer loss.

A recent work Point2Mesh [2] addresses this issue by proposing the Beam-Gap Loss. For a point $ \hat{\mathbf{y}}$ sampled from a deformable mesh face, a beam is cast in the direction of the deformable face normal. The beam intersection with the point cloud is the closest point in an $\epsilon$ cylinder around the beam. Using a cylinder around the beam instead of a single point as in the case of Chamfer distance helps reduce the gap in a certain region between the reconstructed and input mesh.

The beam intersection (or collision) of a point $ \hat{\mathbf{y}}$ is formally given by $ \mathcal{B}(\hat{\mathbf{y}}) = \mathbf{x}$ and the overall beam-gap loss is calculated by:

\[ \mathcal{L}_b (X, \hat{\mathbf{Y}}) = \sum_{ \hat{y} \in \hat{Y}} |\hat{\mathbf{y}}-\mathcal{B}(\hat{\mathbf{y}})|^2 \]

References

[1] Ravi, Nikhila, et al. “Accelerating 3d deep learning with pytorch3d.” arXiv preprint arXiv:2007.08501 (2020).

[2] Hanocka, Rana, et al. “Point2mesh: A self-prior for deformable meshes.” SIGGRAPH (2020).

[3] Alt, H., Knauer, C. & Wenk, C. Comparison of Distance Measures for Planar Curves. Algorithmica

[4] Peter Schäfer. Fréchet View – A Tool for Exploring Fréchet Distance Algorithms (Multimedia Exposition).

[5] Aronov, B., Har-Peled, S., Knauer, C., Wang, Y., Wenk, C. (2006). Fréchet Distance for Curves, Revisited. In: Azar, Y., Erlebach, T. (eds) Algorithms

[6] Ainesh Bakshi, Piotr Indyk, Rajesh Jayaram, Sandeep Silwal, and Erik Waingarten. 2024. A near-linear time algorithm for the chamfer distance. Proceedings of the 37th International Conference on Neural Information Processing Systems.

Post author By ar.bb
Post date July 22, 2024
Tags metric

SDF / Resolution	100×100	30×30	8×8
leaf
leaf marching_squares
uneven capsule
uneven capsule marching_squares
snowman
snowman marching_squares
square
square marching_squares

Collaborators: Nicolas Pigadas

Introduction

Precursors to SLAM

SLAM Dunk – A Culmination (some real Vince Carter stuff)

Key Milestones

From Robotics to Endoscopy (Medical Vision)

Endoscopy SLAMs – What Our Group Looked At

Conclusion

Hausdorff and Chamfer Distance

Area Method for Error

Inherent Error Function

The definition of SDFs using ducks

Our SDFs

Exact vs. Drawn

Operations on SDFs

Marching Squares Reconstructions with Multiple Resolutions

SDFs in 3D Geometry Processing

Setting up RXMesh

RXMesh basic workflow

Writing kernels for queries

Visualizing face normals

References

Author: Krishna CheboluTeammates: Bethlehem Tassew and Kimberly HerreraMentor: Dr. Ankita Shukla

Introduction

Our Study – Methodology and Results

Conclusion

The 1D Case

Different error metrics

The 2D case

Application Areas: Chamfer Loss in Action

When Chamfer is not enough

Author: Krishna Chebolu
Teammates: Bethlehem Tassew and Kimberly Herrera
Mentor: Dr. Ankita Shukla