Categories
Demo SGI research projects

Self-similarity loss for shape descriptor learning in correspondence problems

By Faria Huq, Kinjal Parikh,  Lucas Valença

During the 4th week of SGI, we worked closely with Dr. Tal Shnitzer to develop an improved loss function for learning functional maps with robustness to symmetric ambiguity. Our project goal was to modify recent weakly-supervised works that generated deep functional maps to make them handle symmetric correspondences better.

Introduction:

  • Shape correspondence is a task that has various applications in geometry processing, computer graphics, and computer vision – quad mesh transfer, shape interpolation, and object recognition, to name a few. It entails computing a mapping between two objects in a geometric dataset. Several techniques for computing shape correspondence exist – functional maps is one of them. 
  • A functional map is a representation that can map between functions on two shapes’ using their eigenbases or features (or both). Formally, it is the solution to \(\mathrm{arg}\min_{C_{12}}\left\Vert C_{12}F_1-F_2\right\Vert^2\), where \(C_{12}\) is the functional map from shape \(1\) to shape \(2\) and  \(F_1\), \(F_2\) are corresponding functions projected onto the eigenbases of the two shapes, respectively.

Therefore, there is no direct mapping between vertices in a functional map. This concise representation facilitates manipulation and enables efficient inference.

Figure 1: The approach proposed by Sharma and Ovsjanikov may fail on symmetric regions. Notice the hands, which have not been matched correctly due to their symmetric structure.
  • Recently, unsupervised deep learning methods have been developed for learning functional maps. One of the main challenges in such shape correspondence tasks is learning a map that differentiates between shape regions that are similar (due to symmetry). We worked on tackling this challenge.

Background

  • We build upon the state-of-the-art work “Weakly Supervised Deep Functional Map for Shape Matching” by Sharma and Ovsjanikov, which learns shape descriptors from raw 3D data using a PointNet++ architecture. The network’s loss function is based on regularization terms that enforce bijectivity, orthogonality, and Laplacian commutativity.
  • This method is weakly supervised because the input shapes must be equally aligned, i.e., share the same 6DOF pose. This weak supervision is required because PointNet-like feature extractors cannot distinguish between left and right unless the shapes share the same pose.
  • To mitigate the same-pose requirement, we explored adding another component to the loss function Contextual Loss by Mechrez et al. Contextual Loss is high when the network learns a large number of similar features. Otherwise, it is low. This characteristic promotes the learning of global features and can, therefore, work on non-aligned data.

Work

  1. Model architecture overview: As stated above, our basic model architecture is similar to “Weakly Supervised Deep Functional Map for Shape Matching” by Sharma and Ovsjanikov. We use the basic PointNet++ architecture and pass its output through a \(4\)-layer ResNet model. We use the output from ResNet as shape features to compute the functional map. We randomly select \(4000\) vertices and pass them as input to our PointNet++ architecture.
  2. Data augmentation: We randomly rotated the shapes of the input dataset around the “up” axis (in our case, the \(y\) coordinate). Our motivation for introducing data augmentation is to make the learning more robust and less dependent on the data orientation.
  3. Contextual loss: We explored two ways of adding contextual loss as a component:
    1. Self-similarity: Consider a pair of input shapes (\(S_1\), \(S_2\)) of features \(P_1\) with \(P_2\) respectively. We compute our loss function as follows:
      \(L_{CX}(S_1, S_2) = -log(CX(P_1, P_1)) – log(CX(P_2, P_2))\),
      where \(CX(x, y)\) is the contextual similarity between every element of \(x\) and every element of \(y\), considering the context of all features in \(y\).
      More intuitively, the contextual loss is applied on each shape feature with itself (\(P_1\) with \(P_1\) and \(P_2\) with \(P_2\)), thus giving us a measure of ‘self-similarity’ in a weakly-supervised way. This measure will help the network learn unique and better descriptors for each shape, thus alleviating errors from symmetric ambiguities.
    2. Projected features: We also explored another method for employing the contextual loss. First, we project the basis \(B_1\) of \(S_1\) onto \(S_2\), so that \(B_{12} = B_ 1 \cdot C_{12}\). Similarly, \(B_{21} = B_2 \cdot C_{21}\). Note that the initial bases \(B_1\) and \(B_2\) are computed directly from the input shapes. Next, we want the projection \(B_{12}\) to get closer to \(B_2\) (the same applies for \(B_{21}\) and \(B_1\)). Hence, our loss function becomes:
      \(L_{CX}(S_1, S_2) = -log(CX(B_{21}, B_1)) – log(CX(B_{12}, B_2))\).
      Our motivation for applying this loss function is to reduce symmetry error by encouraging our model to map the eigenbases using \(C_{12}\) and \(C_{21}\) more accurately.
  4. Geodesic error: For evaluating our work, we use the metric of average geodesic error between the vertex-pair mappings predicted by our models and the ground truth vertex-pair indices provided with the dataset.

Result

We trained six different models on the FAUST dataset (which contains \(10\) human shapes at the same \(10\) poses each). Our training set includes \(81\) samples, leaving out one full shape (all of its \(10\) poses) and one full pose (so the network never sees any shape doing that pose). These remaining \(19\) inputs are the test set. Additionally, during testing we used ZoomOut by Melzi et al.

ModelData AugmentationContextual Loss
BaseFalseNone
BATrueNone
SSFalseSelf Similarity
SSATrueSelf Similarity
PFFalseProjected Features
PFATrueProjected Features
Table 1: We trained 7 models using the settings described in this table
Figure 2: The AUC (area under curve) curves and values with the average geodesic error for each model.

For qualitative comparison purposes, the following GIF displays a mapping result. Overall, the main noticeable visual differences between our work and the one we based ourselves on appeared when dealing with symmetric body parts (e.g., hands and feet).

Figure 3: Comparison between shape correspondence produced by the models described in Table 1.

Still, as it can be seen, the symmetric body parts remain the less accurate mappings in our work too, meaning there’s still much to improve.

Conclusion

We thoroughly enjoyed working on this project during SGI and are looking forward to investigating this problem further, to which end we will continue to work on this project after the program. We want to extend our deepest gratitude to our mentor, Dr. Tal Shnitzer, for her continuous guidance and patience. None of this would be possible without her.

Thank you for taking the time to read this post. If there are any questions or suggestions, we’re eager to hear them!

Categories
Demo SGI research projects

Puzzle-ed

The SGI research program has been structured in a novel way in which each project lasts for one or two weeks only. This gives Fellows the opportunity to work with multiple mentors and in different areas of Geometry Processing. A lot of SGI fellows, including me, had wondered how we would be able to finish the projects in such a short period of time. After two weeks, as I pause my first research project at SGI, I am reminded of Professor Solomon’s remark that a surprising amount of work can be done within 1/2 weeks when guided by an expert mentor.

In this post, I have shared the work I have done under the mentorship of Prof. David Levin over the last two weeks.

Optimal Interlocking Parts via Implicit Shape Optimizations

In this project, we explored how to automatically design jigsaw puzzles such that all puzzle pieces are as close to a given input shape \(I\) as possible, while still satisfying the requirement of interlocking.

Lloyd relaxation with Shape Matching metric

As the first step, we needed a rough division of the domain into regions corresponding to puzzle pieces. For this initial division we used Lloyd’s relaxation as it ensured that the pieces would interlock. To create regions similar to the input shape, we employed a new distance metric. This metric is based on shape matching.

Shape Matching metric:

The input shape \(I\) is represented as a collection of points \(V\) on its boundary and is assumed to be of unit size and centered at the origin. Consider a pixel \(p\) and a site \(x\). First the input shape is translated to \(x\) ( \(V \rightarrow V’\)). The value of scaling factor \(s\) that minimizes the distance between \(p\) and the closest point \(V’_i\) on the boundary of the input shape gives us the shape matching metric. \[s^* = \text{arg min} \frac {1} {2}(dist(p, sV’))\]

Pre computing maps

The straightforward implementation for Lloyd’s relaxation that I had been using was iterative and therefore very slow. In fact, testing the algorithm for larger domains or complicated shapes that needed dense sampling of boundary points was infeasible. Therefore, optimizing the algorithm was an immediate requirement. I realized that relative to the site \(x\), the distances would vary around it in exactly the same way. So, computing the distance between each pixel and each site separately during each iteration of Lloyd’s relaxation was redundant and could be avoided by pre-computing a distance map for the given object before starting Lloyd’s relaxation.

Enhancing the shape matching metric

In addition to speeding up the algorithm, the distance maps also helped in determining the next step. As can be seen from the figures above, the maps weren’t smooth. These jumps in distance fields occurred because the shape matching metric worked on a set of discrete points. Optimizing the scaling factor for minimizing distance to the closest edge instead of closest point resolved this.

200×200 Map with enhanced Shape Matching metric input shape of a puzzle piece.

Shaping the puzzle pieces (In progress)

The initial regions are obtained from Lloyd’s algorithm with the shape matching metric need to be refined so that the final puzzle pieces are highly similar to the input shape \(I\).

I am currently exploring the use of point set registration methods to find the best transformations on the input shape so that it fits the initial regions. These transformations could then be used to inform the next iteration of Lloyd’s algorithm to refine the regions. This process can be repeated until the region stops changing, giving us our final puzzle pieces.

I have thoroughly enjoyed working on this project, and I am looking forward to the next project I would be working on!

Categories
Demo News

Before the Beginning

While the official start date, 19th of July, is still a couple of days from now, the SGI experience began the very day I received the acceptance letter back in March. In this post I briefly share my thoughts on the journey so far – SGI’s social event, attending the Symposium on Geometry Processing 2021, and my implementation of a paper presented in the conference.

Slack and Virtual Coffees

The SGI slack channel was created at the end of March and has been buzzing with activity ever since. Initially, as everyone introduced themselves, I was proud to see that I am a part of a team that is truly diverse in every aspect – geographically, racially, religiously and even by educational backgrounds!

Shortly after, we started biweekly ‘virtual coffees’ in which two people were randomly paired up to meet. These sessions have been instrumental in refining my future goals. As someone who entered research just as the world closed down and started working from home, I haven’t had the opportunity to visit labs and chat with graduate students during the lunch break or by that water cooler. Speaking with professors, teaching assistants and SGI Fellows has debunked many of my misconceptions regarding graduate school.  Additionally, I also learned a lot about possible career paths in geometry processing and adjacent fields like computer graphics, and about the culture in this community.

Symposium on Geometry Processing

SGP was a comprehensive four-day event including tutorials, paper presentations and talks on the latest research in the field. While all the sessions were amazing, I was especially awed by the keynote ‘Computing Morphing Matter’ by Professor Lining Yao from Carnegie Mellon University. She talked about various techniques that combine geometry, physics, and material science to create objects that can change shape in a pre-determined manner. This ability to morph can be leveraged in various ways and across many industries. To list a few of its usages, it can be used to create a compact product that can morph into the final shape later thus saving packaging material and shipping costs, to develop self-locking mechanisms, and to produce aesthetically appealing shapes. Due to such varied use cases, morphing matter has applications in furniture designing to food production to engineering problems.

Implementing ‘Normal-Driven Spherical Shape Analogies’

In one section of the tutorial ‘An Introduction to Geometry Processing Programming in MATLAB with gptoolbox’ at SGP, the ease with which complex problems can be solved using MATLAB and gptoolbox was demonstrated by implementing research papers. I had never before seen a language/library tutorial that included paper implementations and was delighted at the prospect of turning the tedious process of learning a new tool into an exciting process of implementing a research paper. Eager to test out this new way, I implemented one of the research papers presented at the SGP – ‘Normal-Driven Spherical Shape Analogies’ by Hsueh-Ti Derek Liu and Alec Jacobson.

The paper details how an input shape can be stylized to look like a target shape by using normal driven shape analogies. The process boils down to three steps:

  1. Compute the mapping between a sphere and the target shape.
  2. Using the sphere-target shape mapping as an analogy, find the target normals for the input object.
  3. Generate the stylized output by deforming the input object to approximate the target normals.

When I arrived at code that gave outputs that looked right, I faced an unanticipated problem: Without any error metric, how can you know that it’s working well? After working on this project I definitely have a renewed appreciation for work in stylization.