Categories
Research

Implementing Spectral Conformal Parameterization using RXMesh

Fellows: Sachin Kishan and Diany Pressato

Teaching Assistant: Supriya Gadi Patil

Mentor: Ahmed Mahmoud

Introduction

RXMesh is a framework for processing triangle mesh on the GPU. It allows developers to easily use the GPU’s massive parallelism to perform common geometry processing tasks on triangle mesh. For a preliminary introduction to RXMesh, check out this tutorial blog here.

Our task was to use RXMesh to implement some classic algorithms in geometry processing. This blog details one of the algorithms we implemented: Spectral Conformal Parameterization (SCP).

This blog will discuss the algorithm in terms of its basic concepts, the steps of the algorithm, the implementation using RXMesh, and our results.

Our entire implementation of SCP application in RXMesh can be found here.

Spectral Conformal Parameterization

Parameterization refers to the mapping of a triangle surface mesh in \(R^{3}\) space to a planar \(R^{2}\) space. This mapping can then be subsequently used in different application such as texture mapping, where we map the values from some other \(R^{2}\) space (e.g., image) to the surface of the mesh. In this case, we map color values to the mesh surface.

Often when we perform mesh parameterization, there is a possibility to change some properties of the geometric elements of the mesh, such as the size, shape, or angle between edges. It is desired to preserve these properties since we usually want to create a direct mapping of our intended values (e.g., texture color) to be in the “correct” positions on the mesh. What is correct is usually up to artistic opinion, and thus a mapping that preserves these properties is desirable. A mapping that conserves angles (and thus shape) is called a conformal mapping.

Mullen et al. (2008) provides a method to perform conformal parameterization of an open surface mesh by modelling the problem as a generalized Eigenvalue problem. Using an initial guess, we can perform multiple iterations of a general eigen vector solving method (i.e., the power method [2] ) to determine the largest eigen vector. This eigen vector, in our case, is the parameterized coordinates of the given mesh, giving us our desired output.

The algorithm for the power method is as follows:

Given \((A-\lambda B)x=0 \)

We want to find \(x\).

Given an initial guess of eigen-vector \(x_{0}\)

For some number of iterations, on iteration k

  • \(B y_{k+1} = A x_{k}\)
  • \(x_{k+1}= \frac{y_{k+1}}{\left| y_{k+1} \right|} \)

where \(x\) is the desired eigenvector solution.

Implementation

We applied the power method to our application in the following way, based on the formulation given by Mullen et al. (2008) in equation 7:

\([B-\frac{1}{V_{b}}e_{b}e_{b}^{t}]u=\mu L_{c}u \)

To find the eigenvector \(u\), we can the power method since the problem is in the form \(B y_{k+1} = A x_{k}\). This can be done in the following way:

\(T2=dot(eb,u) \)
\(T1=Bu – T2 * eb \)
\(L_{c}u=T1 \)
\(u=u/normalise (u) \)

Where \(u\) is the parameterized coordinates. \(T1\) and \(T2\) are temporary variables to store certain scalar or vector values. Other variables are explained by Mullen et al. (2008). This loop can be implemented like this using RXMesh:

for (int i = 0; i < iterations; i++) 
{
    cuComplex T2 = eb.dot(uv_mat);
    rx.for_each_vertex(rxmesh::DEVICE,
                       [eb, T2, T1, uv_mat, B] __device__(
                       const rxmesh::VertexHandle vh) mutable
                       {
                           T1(vh, 0) =
                       cuCsubf(cuCmulf(B(vh, vh), uv_mat(vh, 0)),
                       cuCmulf(eb(vh, 0), T2));
                       });

    Lc.solve(T1, uv_mat);
    float norm = uv_mat.norm2();
    uv_mat.multiply(1.0f / norm);
    if (std::abs(prv_norm - norm) < 0.0001)//convergence check
    {
        break; 
    }
    prv_norm = norm;
}

\(T1\) is a vector whose values are per-vertex, which is why we perform a per vertex computation. \(T2\) is a scalar in which we store the value of the dot product dot(eb,uv).

Since our system matrix \(L_{c}\) has dimensions 2V x 2V, we use complex representations of value entries in the vectors and matrices we use. So, for example, our \(L_{c}\) matrix now has dimensions V x V where each entry is a complex number (with a real and complex part).

This means subsequent computations must use CUDA functions that operate on complex numbers (e.g., cuCsubf which subtracts complex numbers).

All initial pre-computations only allocate to the real part of the entry, except for the calculation of the area term \(A\) where we calculate \(L_{c}=L_{D}-A\) . For the area term, we assign values to the complex part of the area term. \(L_{D}\) represents the standard Laplace-Beltrami operator, which we calculate using the cotangent edge weights and assign to the real part.

After we obtain the eigenvector \(u\) in complex form, we can convert it to a standard V x 2 matrix form and use that for further processing (such as rendering).

Results

Using the SCP implementation, we can now apply texturing to our open meshes as shown in the examples below.

The bunny with a checker texture.
The beetle mesh with a checker texture.

References

[1] Mullen, P., Tong, Y., Alliez, P., & Desbrun, M. (2008). Spectral Conformal Parameterization. Computer Graphics Forum, 27(5), 1487–1494. https://doi.org/10.1111/j.1467-8659.2008.01289.x

[2] Kozłowski, W. (1992). The power method for the generalized eigenvalue problem. Roczniki Polskiego Towarzystwa Matematycznego. Seria 3, Matematyka Stosowana/Matematyka Stosowana/Mathematica Applicanda, 21(35). https://doi.org/10.14708/ma.v21i35.1790

This blog was written by Sachin Kishan and Diany Pressato during the SGI 2024 Fellowship as one of the outcomes of a two-week project under the mentorship of Ahmed Mahmoud and the support of Supriya Gadi Patil as a teaching assistant.

Categories
Research

Implementing ARAP Deformations using RXMesh

Fellows: Sachin Kishan

Teaching Assistant: Supriya Gadi Patil

Mentor: Ahmed Mahmoud

Introduction

RXMesh is a framework for processing triangle mesh on the GPU. It allows developers to easily use the GPU’s massive parallelism to perform common geometry processing tasks. For a preliminary introduction to RXMesh, check out this tutorial blog here.

Our task was to use RXMesh to implement some classic algorithms in geometry processing. This blog details one of the algorithms we implemented: As-Rigid-As-Possible(ARAP) Surface Modeling [1].

This blog will discuss the algorithm in terms of its basic concepts, the steps of the algorithm, the implementation using RXMesh, and the final results.

Our entire implementation of ARAP deformation in RXMesh can be found here.

As-Rigid-As-Possible Surface Modeling

Sorkine and Alexa (2007) formulated a method where we can retain the shape of an object as much as possible despite deforming one (or a set of) its vertices. This retention of shape is achieved by ensuring that subsequent transformations to the mesh after deforming its vertices are rigid. This property of rigidity means that a mesh can only undergo rigid transformations such as translation or rotation, but not stretching or shearing (since these are not rigid transformations).

The algorithm’s focus is to minimize the energy which measures the extent of deformation (dubbed rigidity energy). This energy is minimized when given a specific set of positions or a specific set of rotations. Notice that we can not minimize both at once—since we need the minimized rotational transformation for minimizing the energy from deformed positions. Thus, we instead solve both alternatively, i.e., solve for deformed positions assuming rotation is fixed, then solve for rotation assuming the position is fixed.

Using a Laplacian, we can form a sparse system of linear equations which we can then solve to obtain a new set of positions for the mesh. The algorithm works as follows for a single iteration

  1. Identify the rotation matrix for the given deformed positions which minimizes deformation energy.
  2. Solve for the new positions to minimize the energy, given a set of linear equations.

In this case, the set of linear equations to solve is in the form :

\(Lx=b\)

Where \(L\) is the Laplace-Beltrami operator, which can be formed by finding the cotangent edge weights of the mesh, \(x\) represents the set of new vertex positions we solve for, and \(b\) represents the right-hand side of Equation 8 from Sorkine and Alexa (2007). The derivation is detailed further in the original paper.

\(b = \sum_{j\in N(i)}^{} \frac{w_{ij}}{2} (R_{i}+R_{j})(p_{i}-p_{j})\)

To further minimize the energy, we can perform multiple iterations of the same steps, using updated vertex positions and consequent rotations per iteration.

Implementation

The following code snippets show how the algorithm is implemented in RXMesh.

We first deform the input mesh and set constraints on each vertex depending on whether it is fixed, deformed, or free to move. This will be taken into account when calculating the system matrix \(L\).

 // set constraints
    const vec3<float> sphere_center(0.1818329, -0.99023, 0.325066);
    rx.for_each_vertex(DEVICE, [=] __device__(const VertexHandle& vh) {
        const vec3<float> p(deformed_vertex_pos(vh, 0),
                            deformed_vertex_pos(vh, 1),
                            deformed_vertex_pos(vh, 2));

        // fix the bottom
        if (p[2] < -0.63) {
            constraints(vh) = 2;
        }

        // move the jaw
        if (glm::distance(p, sphere_center) < 0.1) {
            constraints(vh) = 1;
        }
    });

In the code above, we set the mesh to deform the dragon’s jaw and fix its bottom vertices (legs). This can vary depending on the type of input we want to have. Another possibility is to add a feature for users to click and select vertices to deform or fix them in space interactively.

Before we perform the iteration step, we calculate the Laplacian. It can be calculated as the cotangent edge weight matrix. So, we create a kernel using RXMesh.

    auto calc_mat = [&](VertexHandle v_id, VertexIterator& vv) {
        if (constraints(v_id, 0) == 0) {
            for (int i = 0; i < vv.size(); i++) {
          laplace_mat(v_id, v_id) += weight_matrix(v_id, vv[i]);
          laplace_mat(v_id, vv[i]) -= weight_matrix(v_id, vv[i]);
            }
        } else {
            for (int i = 0; i < vv.size(); i++) {
                laplace_mat(v_id, vv[i]) = 0;
            }
            laplace_mat(v_id, v_id) = 1;
        }
    };

The weight matrix is calculated using a special query called “EVDiamond”, where we obtain the two end vertices and the two opposite vertices of an edge. This gives us exactly the data we need to find the cotangent edge weight.

    auto calc_weights = [&](EdgeHandle edge_id, VertexIterator& vv) {
        // the edge goes from p-r while the q and s are the opposite vertices
        const rxmesh::VertexHandle p_id = vv[0];
        const rxmesh::VertexHandle r_id = vv[2];
        const rxmesh::VertexHandle q_id = vv[1];
        const rxmesh::VertexHandle s_id = vv[3];

        if (!p_id.is_valid() || !r_id.is_valid() || !q_id.is_valid() ||
            !s_id.is_valid()) {
            return;
        }

        const vec3<T> p(coords(p_id, 0), coords(p_id, 1), coords(p_id, 2));
        const vec3<T> r(coords(r_id, 0), coords(r_id, 1), coords(r_id, 2));
        const vec3<T> q(coords(q_id, 0), coords(q_id, 1), coords(q_id, 2));
        const vec3<T> s(coords(s_id, 0), coords(s_id, 1), coords(s_id, 2));


        T weight = 0;
   weight += dot((p - q), (r - q)) / length(cross(p - q, r - q));
   weight += dot((p - s), (r - s)) / length(cross(p - s, r - s));

        weight /= 2;
        weight = std::max(0.f, weight);

        edge_weights(p_id, r_id) = weight;
        edge_weights(r_id, p_id) = weight;
    };

We now have all the building blocks necessary to perform an iteration step. This is what an iteration loop looks like:

    // process step
    for (int i = 0; i < iterations; i++) {
        // solver for rotation
        calculate_rotation_matrix<float, CUDABlockSize>
            <<<lb_rot.blocks, lb_rot.num_threads, lb_rot.smem_bytes_dyn>>>(
                rx.get_context(),
                ref_vertex_pos,
                deformed_vertex_pos,
                rotations,
                weight_matrix);

        // solve for position
        calculate_b<float, CUDABlockSize>
            <<<lb_b_mat.blocks,
               lb_b_mat.num_threads,
               lb_b_mat.smem_bytes_dyn>>>(rx.get_context(),
                                          ref_vertex_pos,
                                          deformed_vertex_pos,
                                          rotations,
                                          weight_matrix,
                                          b_mat,
                                          constraints);

        laplace_mat.solve(b_mat, *deformed_vertex_pos_mat);
    }

We perform multiple iterations of the algorithm to effectively minimize the energy. We can see in the code above that we alternatively solve for rotation and then for positions. We first calculate the rotation matrix given a set of deformed positions, then calculate the \(b\) matrix (as defined earlier) and then solve the system of equations to identify the new vertex positions.

The following kernel calculates the rotation matrix per vertex to minimize the rigidity energy given their positions.

auto cal_rot = [&](VertexHandle v_id, VertexIterator& vv) {
        Eigen::Matrix3f S = Eigen::Matrix3f::Zero();

        for (int j = 0; j < vv.size(); j++) {

            float w = weight_matrix(v_id, vv[j]);

            Eigen::Vector<float, 3> pi_vector = {
              ref_vertex_pos(v_id, 0) - ref_vertex_pos(vv[j], 0),
              ref_vertex_pos(v_id, 1) - ref_vertex_pos(vv[j], 1),
              ref_vertex_pos(v_id, 2) - ref_vertex_pos(vv[j], 2)
};


            Eigen::Vector<float, 3> pi_dash_vector = {
                deformed_vertex_pos(v_id, 0) - deformed_vertex_pos(vv[j], 0),
                deformed_vertex_pos(v_id, 1) - deformed_vertex_pos(vv[j], 1),
                deformed_vertex_pos(v_id, 2) - deformed_vertex_pos(vv[j], 2)};

            S = S + w * pi_vector * pi_dash_vector.transpose();
        }

        Eigen::Matrix3f U;         // left singular vectors
        Eigen::Matrix3f V;         // right singular vectors
        Eigen::Vector3f sing_val;  // singular values

        svd(S, U, sing_val, V);

        const float smallest_singular_value = sing_val.minCoeff();


        Eigen::Matrix3f R = V * U.transpose();

        if (R.determinant() < 0) {
            U.col(smallest_singular_value) =
                U.col(smallest_singular_value) * -1;
            R = V * U.transpose();
        }

        // Matrix R to vector attribute R

        for (int i = 0; i < 3; i++)
            for (int j = 0; j < 3; j++)
                rotations(v_id, i * 3 + j) = R(i, j);
    };

We first find the Single Value Decomposition (SVD) of the sum (\(S_{i} \)) where

\(S_{i}=\sum_{j\in N(i)} w_{ij}e_{ij}e^{‘T}_{ij} \) (Equation 5 in Sorkine and Alexa(2007))

We then use that to calculate rotation and assign that as the vertice’s rotation. After we calculate rotation, we calculate the right-hand side of Equation 8 as stated earlier.

auto init_lambda = [&](VertexHandle v_id, VertexIterator& vv) {
        // variable to store ith entry of b_mat
        Eigen::Vector3f bi(0.0f, 0.0f, 0.0f);

        // get rotation matrix for ith vertex
        Eigen::Matrix3f Ri = Eigen::Matrix3f::Zero(3, 3);

        for (int i = 0; i < 3; i++) {
            for (int j = 0; j < 3; j++)
                Ri(i, j) = rotations(v_id, i * 3 + j);
        }

        for (int nei_index = 0; nei_index < vv.size(); nei_index++) {
            // get rotation matrix for neightbor j
            Eigen::Matrix3f Rj = Eigen::Matrix3f::Zero(3, 3);
            for (int i = 0; i < 3; i++)
                for (int j = 0; j < 3; j++)
                    Rj(i, j) = rotations(vv[nei_index], i * 3 + j);

            // find rotation addition
            Eigen::Matrix3f rot_add = Ri + Rj;
            // find coord difference
            Eigen::Vector3f vert_diff = {
                ref_vertex_pos(v_id, 0) - ref_vertex_pos(vv[nei_index], 0),
                ref_vertex_pos(v_id, 1) - ref_vertex_pos(vv[nei_index], 1),
                ref_vertex_pos(v_id, 2) - ref_vertex_pos(vv[nei_index], 2)};

            // update bi
            bi = bi +
                 0.5 * weight_mat(v_id, vv[nei_index]) * rot_add * vert_diff;
        }

        if (constraints(v_id, 0) == 0) {
            b_mat(v_id, 0) = bi[0];
            b_mat(v_id, 1) = bi[1];
            b_mat(v_id, 2) = bi[2];
        } else {
            b_mat(v_id, 0) = deformed_vertex_pos(v_id, 0);
            b_mat(v_id, 1) = deformed_vertex_pos(v_id, 1);
            b_mat(v_id, 2) = deformed_vertex_pos(v_id, 2);
        }
    };

After calling both these kernels, we use the solve function to solve the equations. We then update the mesh with this new set of vertex positions.

Results

Using the implementation above, we can create a per-frame ARAP implementation to animate movement for a mesh. In the example below, we fix the upper half vertices of Spot the Cow and allow the bottom half to freely deform. We then move each of the bottom “feet” vertices in every frame to animate a swimming or walking-like movement in Spot.

The bottom feet vertices are user-deformed , the upper vertices are fixed, and the bottom half (except the feet) are free to move—and are determined by the ARAP algorithm.

Here’s another example with the dragon mesh.

Deform a vertex of the dragon’s jaw while keeping its feet fixed.

Further work

Polyscope shows us that the framerate of these demos is very poor (below 30 fps). We would expect it to do better, especially when using the GPU. A performance analysis as well as optimizing of the solver will be required to make the application reach desired levels of real-time processing.

References

[1] Olga Sorkine and Marc Alexa. 2007. As-rigid-as-possible surface modeling. In Proceedings of the fifth Eurographics symposium on Geometry processing (SGP ’07). Eurographics Association, Goslar, DEU, 109–116. https://doi.org/10.2312/SGP/SGP07/109-116

This blog was written by Sachin Kishan during the SGI 2024 Fellowship as one of the outcomes of a two-week project under the mentorship of Ahmed Mahmoud and the support of Supriya Gadi Patil as a teaching assistant.

Categories
Research

Graph-based Optimal Transport for Keypoint Matching: Extending the Sinkhorn Solver

Project Mentors: Mahdi Saleh and Dani Velikova

SGI Volunteer: Matheus Araujo

Optimal Transport and the Sinkhorn Algorithm:

Optimal Transport (OT), intuitively speaking, finds the most efficient way to transport mass from one probability distribution to another. For example, suppose the surface of a beach is given as the underlying space \(X\), and the various distributions of sand on the surface are represented by probability distributions on \(X\). Suppose also that we are given a cost function \(C:X\times X\to\mathbb{R}_{\geq 0} \), where \(C(x,y)\) represents the cost of moving a single grain of sand from point \(x\) to point \(y\). The OT problem in this setting is to move the sand from a source distribution to a target distribution while minimizing the total transportation cost. Working with continuous distributions is often not feasible computationally, so a common approach is to discretize distributions, say by dividing the beach into a finite collection of bins and extracting an optimal transport plan for moving sand amongst this finite collection.

In the discrete case, the transportation cost can represented by a cost matrix \(C\), where \(C_{i,j}\) is the cost of moving a grain of sand from bin \(i\) to bin \(j\). The Wasserstein distance between two distributions is defined as the transportation cost for the minimal transport plan and can be obtained via the Sinkhorn Algorithm. Sinkhorn’s theorem states that if \(A\) is a square matrix with strictly positive elements, then there exist diagonal matrices \(D_1\) and \(D_2\) with strictly positive diagonal elements such that \(T = D_1AD_2\) is doubly stochastic (all rows and columns sum to 1). The algorithm rescales rows and columns of the input matrix A recursively until it converges to the required doubly stochastic matrix \(T\). \(T_{ij}\) returns the amount of mass that must be transported from bin \(i\) to bin \(j\) in the minimal transport plan. 

Representation of the cost matrix for the 1D problem
Image from https://alexhwilliams.info/itsneuronalblog/2020/10/09/optimal-transport/

6D Pose Estimation using the Sinkhorn Solver:

6D pose estimation refers to the process of determining the position and orientation (together called the pose) of an object (CAD) in a three-dimensional space from an image of that object. Seeing pixels as bins and pixel values as the number of grains of sand, the Sinkhorn algorithm can be repurposed to find a transport plan for matching pixels in the image to points in \(\mathbb{R}^3\). However, if the object is symmetric, there may be ambiguity in its pose and one could get multiple poses that match the image. In this case, the standard Sinkhorn approach fails. Inspired by the work in the SuperGlue {1} and EPOS {2} papers, we aimed to extend the differentiable Sinkhorn solver to cases with one-to-many solutions and incorporate graph neural networks to estimate the 6D pose of such symmetric objects.

The proposed pipeline for the project
6D Object Pose Estimation
Image from Huang, Junwen, et al. “Matchu: Matching unseen objects for 6d pose estimation from rgb-d images.”CVPR. 2024

The hurdles were, first to come up with a cost matrix that best suited the problem, implement a one-to-many Sinkhorn solver from the Python OT library, and use a Perspective-N-Point {5} approach to find the pose of each of the solutions returned by the Sinkhorn solver. Another challenge was finding the correct axis of symmetry of the pose; we planned to first assume prior information about the axis of symmetry and later incorporate the problem of finding this axis itself. Unfortunately, we decided not to continue with the second week of the project and hence were not able to produce concrete results. However, a brief description of our ideas for overcoming each hurdle is given below.

Ambiguity due to symmetry
Image from Manhardt, Fabian, et al. “Explaining the ambiguity of object detection and 6d pose from visual data.” ICCV 2019.

Ideas for the Implementation:

The image was meant to be a 2D point cloud with features from which we could determine a partial 3D point cloud. One idea we had for the cost matrix, in the case where the axis of symmetry is known, was to add a symmetry-aware cost to reduce the cost for matches that are consistent with the given symmetry. We could also try to find the translation vector by inferring it from points on the axis of symmetry of the CAD and the corresponding matches in the image. Once we do this, we will have to deal with the rotation vector, which can be optimized separately using the Sinkhorn algorithm. 

The special Euclidean group (SE(3)) consists of all combinations of rotations and translations of \(\mathbb{R}^3\). If we have no information about the axis of symmetry, then this group forms the set of all possible 6D poses. In this case, that we have to find poses in SE(3) that minimize the transportation cost. We could try to compare images of the CAD after applying a pose and the 2D image point cloud and minimize the distance between the two to find the possible poses of the image.

One approach to extend the existing Sinkhorn solver to get multiple solutions is to first use the existing one to get a single solution, and then find all transport matrices that have a similar cost to the one obtained. Another is to design a collection of cost matrices rather than a single one, in a way that the existing Sinkhorn solver produces multiple (and hopefully all) possible solutions. We could get this collection of matrices by applying various poses to the original CAD as before and compute the cost of going from the modified CADs to the image.

Acknowledgements:

During the one week of the project, I learned so much about optimal transport, image matching, and pose estimation. I am very grateful to our mentors who supported and guided our progress.

References:

{1} Sarlin, Paul-Edouard, et al. “SuperGlue: Learning Feature Matching with Graph Neural Networks.” CVPR, 2020, https://doi.org/10.48550/arXiv.1911.11763.

{2} Hodan, Tomas, et al. “EPOS: Estimating 6D Pose of Objects with Symmetries.” CVPR, 2020, https://doi.org/10.48550/arXiv.2004.00605.

{3} Drost, Bertram, et al. “Explaining the Ambiguity of Object Detection and 6D Pose From Visual Data.”, ICCV, 2019, https://doi.org/10.48550/arXiv.1812.00287

{4} Williams, Alex H. “A Short Introduction to Optimal Transport and Wasserstein Distance.” It’s Neuronal!, October 9, 2020. https://alexhwilliams.info/itsneuronalblog/2020/10/09/optimal-transport/.

{5} OpenCV – Perspective-n-Point (PnP) pose computation, https://docs.opencv.org/4.x/d5/d1f/calib3d_solvePnP.html



Categories
Research

Approximating Surfaces with Meshes: Some Experiments

Project Mentor: Nicholas Sharp 

SGI Volunteer: Qi Zhang

This blog post is a follow up to “How to match the “wiggliness” of two shapes”, where we discussed the metrics we use to measure the similarity between two surfaces. In that post, we briefly talked about approximating a surface by using a regular grid to generate vertices for an approximation mesh. There we observed that as we increased the number of vertices, the surface approximation error decreases. This is somewhat intuitive – as you increase the “fineness” of the approximation mesh, the more closely you would expect it to approximate the actual surface. However, in practical applications we aim to keep the number of vertices as low as possible, in order to reduce both memory space and the amount of runtime computation.

So what can we do to improve the surface approximation without increasing the number of vertices? Are there better ways to select vertices – such that we can minimize the chamfer distance error? This is what we will be exploring in this post.

Regular grids

    Before we try different ways of selecting vertex positions. Hence we repeat the same experiment with a few different surfaces, using a regular grid to generate mesh vertices. First we start with three basic cases with different gaussian curvatures:

    1. Sphere function (positive gaussian curvature)
      Figure 1. Sphere function approximated with regular grid vertices.
      Figure 2. Log-log error plot of grid vertices dimensions and chamfer distance error for the sphere function

      2. Saddle function (negative gaussian curvature)

      Figure 3. Saddle function approximated with regular grid vertices.
      Figure 4. Log-log error plot of grid vertices dimensions and chamfer distance error for the saddle function

      3. Quadratic function (0 gaussian curvature)

      Figure 5. Quadratic function approximated with regular grid vertices.
      Figure 6. Log-log error plot of grid vertices dimensions and chamfer distance error for the quadratic function

      So far everything works well similarly to the previous blog post – the error decreases consistently as the grid resolution.

      In order to really visualize the limitations of using the regular grid to sample points on the mesh, we decided to also include a more “extreme” function that looks harder to approximate – the Cross-in-tray function, which has a few asymptotes.

      4. Cross-in-tray function

      Figure 7. Cross-in-tray function approximated with regular grid vertices

      For this function, we noticed something unusual. When we increase the grid resolution from 6×6, to 7×7, to 8×8, we notice a sharp increase, then decrease in error, unlike the consistent decrease in error as we saw before.

      Figure 8. Error plot of grid vertices dimensions and chamfer distance error for the quadratic function

      We decided to visualize what might be causing this in the figures 9-11 below, where the green transparent surface represents a close approximation of the actual surface, and the magenta surface shows the surface generated by sampling the regular grids with the different dimensions mentioned .

      Figure 9. Approximation with 6×6 grid vertices
      Figure 10. Approximation with 7×7 grid vertices
      Figure 11. Approximation with 8×8 grid vertices

      As seem above, the increase in the approximation error for the 7×7 regular grid is caused by the alignment of the selected vertices with the planes that the asymptotes lie on.

      This is an interesting observation, as we can visibly see that the alignment of the vertices to certain features or properties of a surface affects the surface approximation error. There have been works [1][2] that have touched on heuristics about how when the mesh edges are orthogonal to the direction of principal curvature of the original function it often (but not always!) results in a lower approximation error. The next two type of experiments look into this further.

      Rotated surface function

      We decided to test the alignment of the edges to the direction of principal curvature on the quadratic function, as we could easily identify this direction for this function. We rotated the surface by different angles and generated approximations using the same regular grid, then plotted its error.

      Figure 12. Approximation of unrotated quadratic function
      Figure 13. Approximation of quadratic function rotated 45 degrees around the z axis

      Interestingly, the error is the lowest when the quadratic function is rotated 45 degrees and 225 degrees and highest. This is notable because when the function is rotated 0 or 90 degrees, the short edges of the triangles are in fact orthogonal to the direction of principal curvature, so according to the heuristic these cases should have given the lowest error.

      After closer observation we realized that the 45 degrees and 225 degrees cases are when the longest edge of the triangles are orthogonal with the curvature of the quadratic surface. Perhaps more work can be done to look into this in the future!

      Re-meshed grid

      We also use the Botsch-Kebbelt re-meshing algorithm to re-mesh the regular grid, to generate more randomness in the alignment of the triangles to the principal curvature of the quadratic function. This algorithm takes in the surface as well as the desired average edge lengths of the the triangles in the mesh, and outputs to us a re-meshed surface. The images from left to right demonstrates what happens as we increase the average edge length that we pass in as a parameter.

      Figure 15. Approximation of surface using a re-meshed grid with 0.15 desired edge length
      Figure 16. Approximation of surface using a re-meshed grid with 0.07 desired edge length
      Figure 17. Error plot of input desired re-meshed edge length and chamfer distance error

      To make a more equal comparison, we converted the grid resolution to find the average edge length of the triangles in the regular grids and plotted its error together on a log-log graph. As we can see below, the regular grid does better than the Bosch-Kebbelt re-meshed grid. This is congruent with what we expected – that aligning the edges of the triangles would give us a lower error than just random orientations!

      Figure 18. Comparison of the chamfer distance error for the regular grid and the botsch-kebbelt re-meshed grid

      References:

      [1] Bommes, David, et al. “Mixed-integer quadrangulation.” ACM Transactions on Graphics, vol. 28, no. 3, July 2009, pp. 1–10, doi:10.1145/1531326.1531383.

      [2] Jakob, Wenzel, et al. “Instant Field-aligned Meshes.” ACM Transactions on Graphics, vol. 34, no. 6, Nov. 2015, pp. 1–15, doi:10.1145/2816795.2818078.

      Categories
      Research

      Loop Subdivision for Tetsphere Splatting

      Project Mentor: Minghao Guo

      SGI Volunteer: Shanthika Naik

      Introduction:

      TetSphere Splatting [Guo et al., 2024] is a cutting-edge technique for high-quality 3D shape reconstruction using tetrahedral meshes. This method stands out by delivering superior geometry without relying on neural networks or post-processing. Unlike traditional Eulerian methods, TetSphere Splatting excels in applications such as single-view 3D reconstruction and text-to-3D generation.

      Our project aimed to enhance TetSphere Splatting through two key improvements:

      1. Geometry Optimization: We integrated subdivision and remeshing techniques to refine the geometry optimization process, enabling the capture of finer details and better tessellation.
      2. Adaptive TetSphere Modification: We developed adaptive mechanisms for splitting and merging tetrahedral spheres, enhancing flexibility and detail in the optimization process.

      Setting up the Code Base:

      Our mentor, Minghao, facilitated the setup of the codebase in a cloud environment equipped with GPU support. Using a VSCode tunnel, we were able to work remotely and make modifications within the same environment. Following the instructions in the TetSphere Splatting GitHub README file, we successfully initialized the TetSpheres and executed the TetSphere Splatting process for our input images, including those of a white dog and a cartoon boy.

      Geometry Optimization:

      To improve the geometry optimization in TetSphere Splatting, we explored various algorithms for adaptive remeshing and subdivision. Initially, we considered parametrization-based remeshing using Centroidal Voronoi Tessellation (CVT) but opted for direct surface remeshing due to its efficiency and simplicity. For subdivision, we chose the Loop subdivision algorithm, as it is specifically designed for triangle meshes and better suited for our application compared to Catmull-Clark.

      Implementing Loop Subdivision:

      In the second week of the project, I worked on implementing the loop subdivision algorithm and incorporating it into the Geometry Optimization pipeline. Below is a detailed explanation of the algorithm and the method of implementation using the example of the example of the white dog mesh after TetSphere splatting.

      1. Importing the required libraries, loading and initializing the mesh:

      Instead of working throughout with numpy arrays, I chose to use defaultdict, a subclass of Python dictionaries to reduce the running time of the program. unique_edges defined below ensures that the list of edges doesn’t count each edge twice, once as (e1, e2) and once as (e2, e1).

      import trimesh
      import numpy as np
      from collections import defaultdict
      
      init_mesh = trimesh.load("./results/a_white_dog/final/final_surface_mesh.obj")
      V = init_mesh.vertices
      F = init_mesh.faces
      E = init_mesh.edges
      
      def unique_edges(edges): 
          sorted_edges = np.sort(edges, axis=1)
          unique_edges = np.unique(sorted_edges, axis=0)
          return unique_edges
      2. Edge-Face Adjacency Mapping

      This maps edges to the faces they belong to, which will later help to identify boundary and interior edges.

      def compute_edge_face_adjacency(faces):
          edge_face_map = defaultdict(list)
          for i, face in enumerate(faces):
              v0, v1, v2 = face
              edges = [(min(v0, v1), max(v0, v1)),
                       (min(v1, v2), max(v1, v2)),
                       (min(v2, v0), max(v2, v0))]
              for edge in edges:
                  edge_face_map[edge].append(i)
          return edge_face_map
      3. Computing New Vertices:

      If an edge is a boundary edge (it belongs to only one face), then the new vertex is simply the midpoint of the two vertices at the ends of the edge. If not, the edge belongs to two faces and the new vertex is found to be a weighted average of the four vertices which make the two faces, with a weight of 3/8 for the vertices on the edge and 1/8 for the vertices not on the edge. The function also produces indices for the new vertices.

      (a) The weights for an interior edge; (b) the weights for a boundary edge. Image from https://www.pbr-book.org/3ed-2018/Shapes/Subdivision_Surfaces
      def compute_new_vertices(edge_face_map, vertices, faces):
          new_vertices = defaultdict(list)
          i = vertices.shape[0] - 1
          for edge, facess in edge_face_map.items():
              v0, v1 = edge
              if len(facess) == 1:  # Boundary edge
                  i += 1
                  new_vertices[edge].append(((vertices[v0] + vertices[v1]) / 2, i))
              elif len(facess) == 2:  # Internal edge
                  i += 1
                  adjacent_vertices = []
                  for face_index in facess:
                      face = faces[face_index]
                      for vertex in face:
                          if vertex != v0 and vertex != v1:
                              adjacent_vertices.append(vertex)
                  v2 = adjacent_vertices[0]
                  v3 = adjacent_vertices[1]
                  new_vertices[edge].append(((1 / 8) * (vertices[v2] + vertices[v3]) + 
                                             (3 / 8) * (vertices[v0] + vertices[v1]), i))
          return new_vertices
      4. Create a dictionary of updated vertices:

      This generates updated vertices that include both old and newly created vertices.

      def generate_updated_vertices(new_vertices, vertices, edges):
          updated_vertices = defaultdict(list)
          for i in range(vertices.shape[0]):
              vertex = vertices[i, :]
              updated_vertices[i].append(vertex)
          for i in range(edges.shape[0]):
              e1, e2 = edges[i, :]
              edge = (min(e1, e2), max(e1, e2))
              vertex, index = new_vertices[edge][0]
              updated_vertices[index].append(vertex)
          return updated_vertices
      5. Construct New Faces:

      This constructs new faces using the updated vertices. Each old face is split into four new faces.

      The four child faces created, ordered such that the ith child face is adjacent to the ith vertex of the original face and the fourth child face is in the center of the subdivided face. Image from https://www.pbr-book.org/3ed-2018/Shapes/Subdivision_Surfaces
      def create_new_faces(faces, new_vertices):
          new_faces = []
          for face in faces:
              v0, v1, v2 = face
              edge_0 = tuple(sorted((v0, v1)))
              edge_1 = tuple(sorted((v1, v2)))
              edge_2 = tuple(sorted((v2, v0)))
      
              e0 = new_vertices[edge_0][0][1]
              e1 = new_vertices[edge_1][0][1]
              e2 = new_vertices[edge_2][0][1]
      
              new_faces.append([v0, e0, e2])
              new_faces.append([v1, e1, e0])
              new_faces.append([v2, e2, e1])
              new_faces.append([e0, e1, e2])
          
          return np.array(new_faces)
      6. Modifying Old Vertices Based on Adjacency:

      First, the function compute_vertex_adjacency finds the neighbors of each vertex. Then modify_vertices weights each of the neighbor vertices of each old vertex by a weight β (which is defined in the code below) and weights the old vertex by 1-nβ, where n is the valence of the old vertex. It does not change the new vertices.

      Computing the new position of the old vertex v. Image from https://www.pbr-book.org/3ed-2018/Shapes/Subdivision_Surfaces
      def compute_vertex_adjacency(faces):
          vertex_adj_map = defaultdict(set)
          for face in faces:
              v0, v1, v2 = face
              vertex_adj_map[v0].add(v1)
              vertex_adj_map[v0].add(v2)
              vertex_adj_map[v1].add(v0)
              vertex_adj_map[v1].add(v2)
              vertex_adj_map[v2].add(v0)
              vertex_adj_map[v2].add(v1)
          return vertex_adj_map
      
      def modify_vertices(vertices, updated_vertices, vertex_adj_map):
          modified_vertices = defaultdict(list)
          for i in range(len(updated_vertices)):
              if i in range(vertices.shape[0]):
                  vertex = vertices[i,:]
                  neighbors = vertex_adj_map[i]
                  n = len(neighbors)
                  beta = (5 / 8 - (3 / 8 + 1 / 4 * np.cos(2 * np.pi / n)) ** 2)/n
                  weight_v = 1 - beta * n
                  modified_vertex = weight_v * vertex
                  for neighbor in neighbors:
                      neighbor_point = updated_vertices[neighbor][0]
                      modified_vertex += beta * neighbor_point
                  modified_vertices[i].append(modified_vertex)
              else:
                  modified_vertices[i].append(updated_vertices[i][0])
          return modified_vertices
      7. Generating the Final Subdivided Mesh:

      Now, all that is left is to collect the modified vertices and faces as numpy arrays and create the final subdivided mesh. The function loop_subdivision_iter simply applies the loop subdivision iteratively ‘n’ times, further refining the loop each time.

      def create_vertices_array(modified_vertices):
          vertices_list = [modified_vertices[i][0] for i in range(len(modified_vertices))]
          return np.array(vertices_list)
      
      def loop_subdivision(vertices, faces, edges):
          edges = unique_edges(edges)
          edge_face_map = compute_edge_face_adjacency(faces)
          new_vertices = compute_new_vertices(edge_face_map, vertices, faces)
          updated_vertices = generate_updated_vertices(new_vertices, vertices, edges)
          new_faces = create_new_faces(faces, new_vertices)
          vertex_adj_map = compute_vertex_adjacency(new_faces)
          modified_vertices = modify_vertices(vertices, updated_vertices, vertex_adj_map)
          vertices_array = create_vertices_array(modified_vertices)
          return vertices_array, new_faces
      
      def loop_subdivision_iter(vertices, faces, n):
          def unique_edges(faces): 
                  edges = np.vstack([
                  np.column_stack((faces[:, 0], faces[:, 1])),
                  np.column_stack((faces[:, 1], faces[:, 2])),
                  np.column_stack((faces[:, 2], faces[:, 0]))
                      ])
                  sorted_edges = np.sort(edges, axis=1)
                  unique_edges = np.unique(sorted_edges, axis=0)
                  return unique_edges
          if n == 0:
              edges = unique_edges(faces)
              return loop_subdivision(V, F, E)
          else:
              edges = unique_edges(faces)
              vertices, faces = loop_subdivision(vertices, faces, edges)
              output = loop_subdivision_iter(vertices, faces, n-1)
              return output
      
      new_V, new_F = loop_subdivision_iter(V, F, 1)
      subdivided_mesh = trimesh.Trimesh(vertices=new_V, faces=new_F)
      subdivided_mesh.export("./a_white_dog_subdivided.obj")
      

      Results:

      I used the above code to obtain the subdivided meshes for four input meshes – a white dog, a cartoon boy, a camel, and a horse. The results for each of these are displayed below.

      Conclusion:

      The project successfully enhanced the TetSphere Splatting method through geometry optimization and adaptive TetSphere modification. The implementation of Loop subdivision refined mesh detail and smoothness, as evidenced by the improved results for the test meshes. The adaptive mechanisms introduced greater flexibility, contributing to more precise and detailed 3D reconstructions.

      Through this project I learned a lot about TetSphere Splatting, Geometry Optimization and Loop Subdivion. I am very grateful to Minghao and Shanthika who supported and guided us in our progress.

      References:

      {1} Guo, Minghao, et al. “TetSphere Splatting: Representing High-Quality Geometry with Lagrangian Volumetric Meshes”, https://doi.org/10.48550/arXiv.2405.20283

      {2} Kerbl, Bernhard, et al. “3D Gaussian Splatting 
      for Real-Time Radiance Field Rendering” SIGGRAPH 2023, https://doi.org/10.48550/arXiv.2308.04079.

      {3} “Remeshing I”, https://graphics.stanford.edu/courses/cs468-12-spring/LectureSlides/13_Remeshing1.pdf

      {4} “Subdivision”, https://www.cs.cmu.edu/afs/cs/academic/class/15462-s14/www/lec_slides/Subdivision.pdf

      {5} Pharr, Matt, et al. “Physically Based Rendering: From Theory To Implementation – 3.8 Subdivision Surfaces”, https://www.pbr-book.org/3ed-2018/Shapes/Subdivision_Surfaces.

      Categories
      Uncategorized

      The Origins of ‘Bug’ and ‘Debugging’ in Computing and Engineering

      By Ehsan Shams

      During these six fascinating weeks at SGI, I have said and heard the words “bug” and “debugging” so often that if I had an inch for every time, I could go from Egypt to Tahiti and back a few dozen times. Haha! As an etymology enthusiast, I couldn’t resist digging into the origins of these terms. Here’s what I discovered.

      The use of the term “bug” in engineering and technology contexts dates back to at least the 19th century. The term was used by Thomas Edison in a letter written in 1878 to Tivadar Puskás here, in which Edison referred to minor engineering flaws or difficulties as “bugs,” implying that these small issues were natural in the process of invention and experimentation.

      Specifically, Edison wrote about the challenges of refining his inventions in this letter, stating:

      “It has been just so in all of my inventions. The first step is an intuition—and comes with a burst, then difficulties arise—this thing gives out and [it is] then that ‘bugs’—as such little faults and difficulties are called—show themselves, and months of intense watching, study and labor are requisite before commercial success—or failure—is certainly reached.”

      Thomas Edison and His ‘Insomnia Squad’: Debugging Inventions All Night and Napping Through the Day. Photo: Bettmann/Getty Images

      The term gained further prominence in the mid-20th century, especially in computing when engineers working on the Mark II computer at Harvard University in 1947, and found that the computer was malfunctioning. Upon investigation, Grace Hopper discovered that a moth had become trapped in one of the machine’s relays (Relay #70 on Panel “F” of the Harvard Mark II Aiken Relay Calculator), causing a problem, and took a photo of it and documented it.

      Harvard Mark II Aiken Relay Calculator

      The engineers removed the moth and taped it into the log book with the note: “First actual case of a bug being found.” While the term “bug” had already been in use to describe technical issues or glitches, this incident provided a literal example of the term, and it is often credited with popularizing the concept of “de-bugging” in computer science.

      The logbook, complete with the taped moth, is preserved in the Smithsonian National Museum of American History as a notable piece of computing history. This incident symbolically linked the term “bug” with computer errors in popular culture and the history of technology.

      Turns out that the term “bug” has a rich etymology, originating in Middle English where it referred to a ghost or hobgoblin that troubled and scared people. By the 16th century, “bug” started to be used to describe insects, particularly those that were seen as pests, such as bedbugs which cause discomfort, then in the 19th century, engineers adopted “bug” as a metaphor for small flaws or defects in machinery, likening these issues to tiny pests that could disrupt a system’s operation and cause discomfot.

      It’s hilarious how the word “bug” still sends shivers down our spines—back then, it was ghosts and goblins, and now it could be a missing bracket hiding somewhere in a hundred-something-line program, a variable you forgot to declare, or an infinite loop that makes your code run forever!

      References:

      1. Edison, T. (1878). Letter to Tivadar Puskás. The Thomas Edison Papers. Rutgers University. https://edisondigital.rutgers.edu/document/D7821ZAO#?xywh=-2%2C-183%2C940%2C1090
      2. National Geographic Education. (n.d.). World’s first computer bug. Retrieved August 18, 2024, from https://education.nationalgeographic.org/resource/worlds-first-computer-bug
      3. Hopper, G. M. (1987). The Mark II computer and the first ‘bug’. Proceedings of the ACM History of Programming Languages Conference.


      Categories
      Research

      Introduction to 3D Gaussian Splatting

      Project Mentors: Ilke Demir, Sainan Liu and Alexey Soupikov

      Among novel view synthesis methods, Neural Radiance Fields, or NeRFs, revolutionized implicit 3D scene representations by optimizing a Multi-Layer Perceptron (or MLP) using volumetric ray-marching [1]. While the continuity in these methods helps the optimization procedure, achieving high visual quality requires neural networks that are costly to train and render.

      3D Gaussian Splatting for Real-Time Radiance Field Rendering

      A recent work [2] addresses this issue by proposing a 3D Gaussian representation that achieves equal or better quality than the previous implicit radiance field approaches. This method builds on three main components, namely (1) Structure-from-Motion (SfM), (2) optimization of 3D Gaussian properties, and (3) real-time rendering. The proposed method demonstrates state-of-the-art visual quality and real-time rendering on several established datasets.

      1. Differentiable 3D Gaussian Splatting

      As with previous NeRF-like methods, the 3D Gaussian splatting method takes a set of images of a static scene, together with the corresponding cameras calibrated by Structure-from-Motion (SfM) [3] as input. SfM enables obtaining a sparse point cloud without normals to initialize a set of 3D Gaussians. Following Zwicker et. al. [4], the Gaussians are defined as

      \(G(x)=e^{-\frac{1}{2}(x)^T \Sigma^{-1}(x)}, \) (1)

      where \(\Sigma\) is a full 3D covariance matrix defined in world space [4] centered at point (mean) \(\mu\). This Gaussian is multiplied by the parameter \(\alpha\) during the blending process.

      As we need to project the 3D Gaussians to 2D space for rendering, following Zwicker et al. [4] let us define the projection to the image space. Given a viewing transformation \(W\), the covariance matrix \(\Sigma’\) in camera coordinates can be denoted as

      \(\Sigma^{\prime}=J W \Sigma W^T J^T, \) (2)

      where \(J\) is the Jacobian of the affine approximation of the projective transformation.

      The covariance matrix \(\Sigma\) of a 3D Gaussian is analogous to describing the configuration of an ellipsoid. Given a scaling matrix \(S\) and rotation matrix \(R\), we can find the corresponding \(\Sigma\) such that

      \(\Sigma=R S S^T R^T.\) (3)

      This representation of anisotropic covariance allows the optimization of 3D Gaussians to adapt to the geometry of different shapes in captured scenes, resulting in a fairly compact representation.

      2. Optimization with Adaptive Density Control of 3D Gaussians

      2.1 Optimization of Gaussians

      The optimization step creates a dense set of 3D Gaussians representing the scene for free-view synthesis. In addition to positions \(p, 𝛼,\) and covariance \(\Sigma\), the spherical harmonics (SH) coefficients representing color \(c\) of each Gaussian are also optimized to correctly capture the view-dependent appearance of the scene. The optimization of these parameters is interleaved with steps that control the density of the Gaussians to better represent the scene.

      The optimization is performed through successive iterations of rendering and comparing the resulting image to the training views in the provided set of images in the static scene. The loss function is\(L_1\) is combined with a D-SSIM term as

      \(\mathcal{L}=(1-\lambda) \mathcal{L}_1+\lambda \mathcal{L}_{\text {D-SSIM }}. \) (4)

      2.2 Adaptive Control of Gaussians

      The optimization step needs to be able to create geometry and also destroy or move geometry when needed. The adaptive control enables a scheme to densify the Gaussians. After optimization warm-up, the densification is performed at every 100 iterations. In addition, any Gaussian that is essentially transparent, and therefore does not have a contribution in the representation, with \(\alpha < \epsilon_\alpha\) is removed.

      Figure 1. demonstrates the densification procedure for the adaptive control of Gaussians.

      Simply put, the densification procedure clones a new Gaussian when the small-scale geometry is not sufficiently covered. In the opposite case, when the small-scale geometry is represented by one large splat, the Gaussian is then split in two. Optimization continues with this new set of Gaussians.

      3. Fast Differentiable Rasterizer for Gaussians

      Tiling and camera view frustum. The method first splits the screen into \(16\times16\) tiles and then culls the 3D Gaussians against the camera view frustum and each tile. Only the Gaussians with 99% confidence intersecting the frustum are kept. In addition, the Gaussians at extreme positions like the ones with means close to the near plane and far outside the view frustum are rejected.

      Sorting. All Gaussians are instantiated concerning the number of tiles they overlap. Each instance is assigned a key that combines view space depth and tile ID, and the keys are sorted via single-fast GPU Radix sort.

      Rasterization. Finally, a list for each tile is produced using the first and last depth-sorted entry that splats to a given tile. Rasterization is performed in parallel with one thread block for each tile. For rasterization, we launch one thread block for each tile. Each thread first loads the Gaussians into shared memory and then, for a given pixel, accumulates color and \(\alpha\) values by traversing the lists front to back.

      This procedure enables fast overall rendering and fast sorting to allow approximate \(\alpha\)-blending and to avoid hard limits on the number of splats that can receive gradients.

      Our Results

      We test the 3D Gaussian splatting method, on the recently proposed dataset Neu3D [5]. Note that the original 3D Gaussian splatting work is limited to static scenes, while the Neu3D dataset provides dynamic scenes in time. Therefore, we provide the initial frames from each camera view of the flaming salmon scene as input. This corresponds to 19 frames of camera views, in total. We first execute the SfM method, in this case, COLMAP, on these frames to obtain the camera poses as discussed in Sec. 1. Next, we optimize the set of 3D Gaussians for a duration of 30K iterations.

      Quantitative Evaluation: Metrics and Runtime

      After 30K iterations of training, we report the metrics that are computed on the 3D Gaussian splatting method. \(L_1\) loss corresponds to the average difference between the set of training images and the rendered images as in Sec. 2.1 (Eq. 4). PSNR, or Peak Signal-to-Noise Ratio, corresponds to the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation, measured in decibels. The memory row shows how much memory space the resulting representation occupies.

      L1 loss [30K]0.0074
      PSNR [30K]37.8426 dB
      Memory [PLY]162,5 MB
      Table 1. shows the metrics for 3D Gaussian splatting on the flaming salmon scene of Neu3D.

      We additionally report the runtime for the first two components, i.e. executing the SfM method to initialize the Gaussians and optimizing the Gaussians to obtain the resulting scene representation. The third component, rasterization, is performed in real time.

      COLMAP – Elapsed time0.013 minutes
      Training Splats [30K]10 minutes
      Table 2. shows the runtime for 3D Gaussian splatting on the flaming salmon scene of Neu3D.

      Qualitative Evaluation

      We demonstrate the results from the model trained on the initial camera frames of the flaming salmon scene of Neu3D. We provide the real-time rendering results from the viewer where the camera is moving in space in Fig. 2.

      Figure 2. shows the results of real-time rendering on SIBR viewer.

      Conclusion and Future Work

      3D Gaussian splatting is the first approach that truly allows real-time, high-quality radiance field rendering while requiring training times competitive with the fastest previous methods. One of the greatest limitations of this method comes from the memory consumption as the GPU memory usage can rise to ~20GB during training and therefore results in high storage costs. Second, the scenes that 3D Gaussian splatting is concerned with are static and do not incorporate an additional dimension to space, i.e. time. Recent work addresses these limitations through quantization and improvements on Gaussian representation to represent both time and space which we will discuss in the next post.

      References

      [1] Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2021). Nerf: Representing scenes as neural radiance fields for view synthesis. ECCV 2020.

      [2] Kerbl, B., Kopanas, G., Leimkühler, T., & Drettakis, G. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Trans. Graph. (SIGGRAPH) 2023.

      [3] Snavely, N., M. Seitz, S., and Szeliski, R. Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. (SIGGRAPH) 2006.

      [4] Zwicker, M., Pfister, H., Van Baar, J., and Gross, M. EWA volume splatting. In Proceedings Visualization, VIS’01. IEEE, 2001.

      [5] Li, T., Slavcheva, M., Zollhoefer, M., Green, S., Lassner, C., Kim, C., Schmidt, T., Lovegrove, S., Goesele, M., Newcombe, R., et al. Neural 3d video synthesis from multi-view video. CVPR 2022.

      Categories
      Math

      The Area of a Circle

      Author: Ehsan Shams (Alexandria University, Egypt)

      “Since people have tried to prove obvious propositions,
      they have discovered that many of them are false.”
      Bertrand Russell

      Archimedes proposed an elegant argument indicating that the area of a circle (disk) of radius \(r\) is \( \pi r^2 \). This argument has gained attention recently and is often presented as a “proof” of the formula. But is it truly a proof?

      The argument is as follows: Divide the circle into \(2^n\) congruent wedges, and re-arrange them into a strip, for any \( n\) the total sum of the wedges (the area of the strip) is equal to the area of the circle. The top and bottom have arc length \( \pi r \). In the limit the strip becomes a rectangle, meaning as \( n \to \infty \), the wavy strip becomes a rectangle. This rectangle has area \( \pi r^2 \), so the area of the circle is also \( \pi r^2 \).

      These images are taken from [1].

      To consider this argument a proof, several things need to be shown:

      1. The circle can be evenly divided into such wedges.
      2. The number \( \pi \) is well-defined.
      3. The notion of “area” of this specific subset of \( \mathbb{R}^2 \) -the circle – is well-defined, and it has this “subdivision” property used in the argument. This is not trivial at all; a whole field of mathematics called “Measure Theory” was founded in the early twentieth century to provide a formal framework for defining and understanding the concept of areas/volumes, their conditions for existence, their properties, and their generalizations in abstract spaces.
      4. The limiting operations must be made precise.
      5. The notion of area and arc length is preserved under the limiting operations of 4.

      Archimedes’ elegant argument can be rigorised but, it will take some of work and a very long sheet of paper to do so.

      Just to provide some insight into the difficulty of making satisfactory precise sense of seemingly obvious and simple-to-grasp geometric notions and intuitive operations which are sometimes taken for granted, let us briefly inspect each element from the above.

      Defining \( \pi \):

      In school, we were taught that the definition of \( \pi \) is the ratio of the circumference of a circle to its diameter. For this definition to be valid, it must be shown that the ratio is always the same for all circles, which is
      not immediately obvious; in fact, this does not hold in Non-Euclidean geometry.

      A tighter definition goes like this: the number \( \pi \) is the circumference of a circle of diameter 1. Yet, this still leaves some ambiguity because terms like “circumference,” “diameter,” and “circle” need clear definitions, so here is a more precise version: the number \( \pi \) is half the circumference of the circle \( \{ (x,y) | x^2 + y^2 =1 \} \subseteq \mathbb{R}^2 \). This definition is clearer, but it still assumes we understand what “circumference” means.

      From calculus, we know that the circumference of a circle can be defined as a special case of arc length. Arc length itself is defined as the supremum of a set of sums, and in nice cases, it can be exactly computed using definite integrals. Despite this, it is not immediately clear that a circle has a finite circumference.

      Another limiting ancient argument that is used to estimate and define the circumference of a circle and hence calculating terms of \( \pi \) to some desired accuracy since Archimedes was by using inscribed and circumscribed regular polygons. But still, making a precise sense of the circumference of a circle, and hence the number \( \pi \), is a quite subtle matter.

      The Limiting Operation:

      In Archimedes’ argument, the limiting operation can be formalized by regarding the bottom of the wavy strip (curve) as the graph of a function \(f_n \), and the limiting curve as the graph of a constant function \(f\). Then \( f_n \to f \) uniformly.

      The Notion of Area:

      The whole of Euclidean Geometry deals with the notions of “areas” and “volumes” for arbitrary objects in \( \mathbb{R}^2 \) and \( \mathbb{R}^3 \) as if they are inherently defined for such objects and merely need to be calculated. The calculation was done by simply cutting the object into finite simple parts and then rearranging them by performing some rigid motion like rotation or translation and then reassembling those parts to form a simpler body which we already know how to compute. This type of Geometry relies on three hidden assumptions:

      • Every object has a well-defined area or volume.
      • The area or volume of the whole is equal to the sum of the areas or volumes of its parts.
      • The area or volume is preserved under such re-arrangments.

      This is not automatically true for arbitrary objects; for example consider the Banach-Tarski Paradox. Therefore, proving the existence of a well-defined notion of area for the specific subset describing the circle, and that it is preserved under the subdivision, rearrangement, and the limiting operation considered, is crucial for validating Archimedes’ argument as a full proof of the area formula. Measure Theory addresses these issues by providing a rigorous framework for defining and understanding areas and volumes. Thus, showing 1,3, 4, and the preservation of the area under the limiting operation requires some effort but is achievable through Measure Theory.

      Arc Length under Uniform Limits:

      The other part of number 5 is slightly tricky because the arc length is not generally preserved under uniform limits. To illustrate this, consider the staircase curve approximation of the diagonal of a unit square in \( \mathbb{R}^2 \). Even though as the step curves of the staircase get finer and they converge uniformly to the diagonal, their total arc length converges to 2, not to \( \sqrt{2} \). This example demonstrates that arc length, as a function, is not continuous with respect to uniform convergence. However, arc length is preserved under uniform limits if the derivatives of the functions converge uniformly as well. In such cases, uniform convergence of derivatives ensures that the arc length is also preserved in the limit. Is this provable in Archimedes argument? yes with some work.

      Moral of the Story:

      There is no “obvious” in mathematics. It is important to prove mathematical statements using strict logical arguments from agreed-upon axioms without hidden assumptions, no matter how “intuitively obvious” they seem to us.

      “The kind of knowledge which is supported only by
      observations and is not yet proved must be carefully
      distinguished from the truth; it is gained by induction, as
      we usually say. Yet we have seen cases in which mere
      induction led to error. Therefore, we should take great
      care not to accept as true such properties of the numbers
      which we have discovered by observation and which are
      supported by induction alone. Indeed, we should use such
      a discovery as an opportunity to investigate more exactly
      the properties discovered and to prove or disprove them;
      in both cases we may learn something useful.”
      L. Euler

      “Being a mathematician means that you don’t take ‘obvious’
      things for granted but try to reason. Very often you will be
      surprised that the most obvious answer is actually wrong.”
      –Evgeny Evgenievich

      Bibliography

      1. Strogatz, S. (1270414824). Take It to the Limit. Opinionator. https://archive.nytimes.com/opinionator.blogs.nytimes.com/2010/04/04/take-it-to-the-limit/
      2. Tao, T. (2011). An introduction to measure theory. Graduate Studies in Mathematics. https://doi.org/10.1090/gsm/126
      3. Pogorelov, A. (1987). Geometry. MIR Publishers.
      4. Blackadar, B. (2020). Real Analysis Notes
      Categories
      Research

      Fitting Surfaces with Noise Regularization

      Mentor: Amir Vaxman

      Volunteer: Alberto Tono

      Fellows: Mutiraj Laksanawisit, Diana Aldana Moreno, Anthony Hong

      Our task is to fit closed and watertight surfaces to point clouds that might have a lot of noise and inconsistent outliers. There are many papers like this, where we will explore a modest variation on the problem: we will allow the system to modify the input point clouds in a controlled manner, where the modification will be gradually scaled back until the method fits to the original point clouds. The point of this experiment is to check whether such modification is beneficial for both accuracy and training time.

      Regular Point-Cloud Fitting

      A point clouds is an unordered set of 3D points: \(\mathscr{P}=\left\{x_i, y_i, z_i\right\}\), where \(0<i\leq N\) for some size \(N\). These points can be assumed to be accompanied by normals \(n_i\) (as unit vectors associated to each points). An \emph{implicit} surface reconstruction tries to fit a function \(f(x,y,z)\) to every point in space, such that its zero set
      $$
      S = f^{-1}(0)=\left\{(x,y,z) | f(x,y,z)=0\right\}
      $$
      represents the surface.

      We usually train \(f\) to be a signed distance function (SDF) \(\mathbb{R}^3\rightarrow\mathbb{R}\) from the zero set. That is, to minimize the loss:
      $$
      f = \text{argmin}\left(\lambda_{p}E_p+\lambda_{n}E_n+\lambda_{s}E_s\right)
      $$

      where
      \begin{align}
      E_p &= \sum_{i=0}^N{|f(x_i,y_i,z_i)|^2}\\
      E_n &= \sum_{i=0}^N{|\nabla f(x_i,y_i,z_i)-n_i|^2}\\
      E_s &= \sum_{j=0}^Q{||\nabla f(x_j,y_j,z_j)|-1|^2}
      \end{align}
      Loss term \(E_p\) is for the zero-set fitting. Loss term \(E_n\) is for the normal fitting, and term \(E_s\) is for regulating \(f\) to be an SDF; it is trained on a randomly sampled set \(\mathscr{Q}=\left(x_j,y_j,z_j\right)\) (of size \(Q=|\mathscr{Q}|\)) in the bounding box of the object, that gets replaced every few iterations. The bounding box should be a bit bigger, say cover %20 more in each axis, to allow for enough empty space around the object.

      Reconstruction Results

      We extracted meshes from this repository and apply Poisson disc sampling with point-number parameter set at 1000 (but note that the resulted sampled points are not 1000; see the table below) on MeshLab2023.12. We then standardize the grids and normalize the normals (the xyz file with normals does not give normals in this version of MeshLab so we need to an extra step of normalization). Then we ran the learning procedure on these point clouds to fit an sdf with 500 epochs and other parameters shown below.

      net = train_sdf_net(standardized_points, normals, epochs=500, batch_size=N,
      learning_rate=0.01, lambda_p=1.0, lambda_n=1.5, lambda_s=0, neuron_factor=4)
      

      We then plots the results (the neural netwok, the poitclouds and normals) and registered isosurfaces as meshes on polyscope and used slice planes to see how good are the fits. Check the table below.

      NefertitiHomerKitten
      Original meshes
      Vertices of sample142214411449
      Reconstructions
      slice 1
      slice 2
      slice 3
      slice 4
      slice 5
      slice 6
      Reconstruction results of Nefertiti.obj, Homer.obj, and Kitten.obj.

      Ablation Test

      There are several parameters and methods we can play around. Basically, we did not aim to prove for some “optimal” set of parameters. Our approach is very experimental: we tried different set of parameters (learning rate, batch size, epoch number, and weights of each energy term) and see how good is the fit for each time we changed the set. If the results look good then the set is good, and the set we chose is included in the last subsection. We used the ReLU activation function and the results are generally good. We also tried tanh, which gives smoother reconstructed surfaces.

      Adversarial Point Cloud Fitting

      First try to randomly add noise to the point cloud of several levels, and try to fit again. You will see that the system probably performs quite poorly. Do so by adding a \(\epsilon_i\) to each point \((x_i,y_i,z_i)\) of a small Gaussian distribution. Test the result, and visualize it to several variances of noise.

      We’ll next reach the point of the project, which is to learn an adversarial noise. We will add another module \(\phi\) that connects between the input and the network, where:
      $$
      \phi(x,y,z) = (\delta, p)
      $$
      \(\phi\) outputs a neural field \(\delta:\mathbb{R}^3\rightarrow \mathbb{R}^3\), that serves as a perturbation of the point clouds, and another neural field \(p:\mathbb{R}^3\rightarrow [0,1]\) that serves as the confidence measure of each point. Both are fields; that is they are defined ambiently in \(\mathbb{R}^3\), but we’ll use their sampling on the points \(\mathscr{P}_i\). Note that \(f\) is the same way!

      The actual input to \(f\) is then \((x,y, z)+\delta\), where we extend the network to also receive \(p\). We then use the following augmented loss:
      $$
      f = \text{argmin}\left(\lambda_{p}\overline{E}_p+\lambda_{n}\overline{E}_n+\lambda_{s}\overline{E}_s +\lambda_d E_d + \lambda_v E_v + \lambda_p E_p\right),
      $$
      where
      \begin{align}
      E_d &= \sum_{i=0}^N{|\delta_i|^2}\\
      E_v &= \sum_{i=0}^N{|\nabla \delta_i|^2}\\
      E_p &= -\sum_{i=0}^N{log(p_i)}.
      \end{align}
      \(E_d\) regulates the magnitude of \(\delta\); it is zero if we try to adhere to the original point cloud. \(E_v\) regulates the smoothness of \(\delta\) in space, and \(E_p\) encourages the point cloud to trust the original points more, according to their confidences (it’s perfectly \(0\) when all \(p_i=1\). We use \(\delta_i=\delta(x_i,y_i,z_i)\) and similarly for \(p_i\).

      The original losses are also modified according to \(\delta\) and \(p\): consider \(\overline{f} = f((x,y,z)+\delta)\), then:
      \begin{align}
      \overline{E}_p &= \sum_{i=0}^N{p_i|\overline{f}(x_i,y_i,z_i)|^2}\\
      \overline{E}_n &= \sum_{i=0}^N{p_i|\nabla \overline{f}(x_i,y_i,z_i)-n_i|^2}\\
      \overline{E}_s &= \sum_{j=0}^Q{||\nabla \overline{f}(x_j,y_j,z_j)|-1|^2}
      \end{align}
      By controlling the learnable \(\delta\) and \(p\), we aim to achieve a better loss; that essentially means smoothing the points and trusting those that are less noisy. This should result is quicker fitting, but less accuracy. We then, gradually with the iterations, try to increase \(\lambda_{d,v,p}\) more and more so to subdue \(\delta\) and trust the original point cloud more and more.

      Reconstruction Results

      We use a network with a higher capacity to represent the perturbations \(\delta\) and the confidence probability \(p\). Specifically, we define a \(2\)-hidden layer ReLU MLP with sine for the first activation layer. This allows to represent high frequency details necessary to approximate the noise \(\delta\).

      During the experiments, we notice that the parameters \(\lambda_d\), \(\lambda_v\), and \(\lambda_p\) have different behaviors over the resulting loss. The term \(E_d\) converges easily, so the value for \(\lambda_d\) is chosen to be small. On the other hand, the loss \(E_v\) that regulates smoothness is harder to fit so we choose a bigger \(\lambda_v\). Finally, the term \(E_p\) is the most sensitive one since it is defined as a sum of logarithms, meaning that small coordinates of \(p\) may lead to \(\texttt{inf}\) problems. For this term, a not so big, not so small value must be chosen, so as to have a confidence parameter near \(1\) and still converge.

      As we can see from the image below, our reconstruction still has a way to go, however, there are still many ideas we still could test: How do the architectures influence the training? Which is the best function to reduce the influence of the adversarial loss terms during training? Are all loss terms really necessary?

      Overall this is an interesting project that allow us to explore possible solutions to noisy data in 3D, that it’s still far away from being completely explored.

      Categories
      Research

      Exploring NICER-SLAM: Neural Implicit Scene Encoding for RGB SLAM

      In the world of computer vision, Simultaneous Localization and Mapping (SLAM) is a critical technique used to create a map of an environment while simultaneously keeping track of a device’s position within it. Traditional SLAM methods rely heavily on geometric models and point cloud data to represent the environment. However, with the advent of deep learning and neural implicit representations, new approaches are revolutionizing SLAM.

      One such approach is NICER-SLAM, which stands for Neural Implicit Scene Encoding for RGB SLAM. NICER-SLAM merges the power of Neural Radiance Fields (NeRFs) and RGB camera inputs to construct accurate 3D maps and deliver more robust tracking performance. This blog will take you through the key concepts, architecture, and applications of NICER-SLAM.

      What is NICER-SLAM?

      NICER-SLAM is a learning-based approach to RGB SLAM, which bypasses traditional point cloud generation and instead encodes the environment using implicit neural networks. Unlike conventional SLAM systems that rely on a feature-based pipeline, NICER-SLAM optimizes a neural network to represent the scene as a continuous function over space. This implicit representation of the scene significantly enhances the accuracy of mapping and tracking tasks using just RGB data.

      Why Neural Implicit Representations?

      Neural implicit representations (or NeRFs) have recently gained traction due to their ability to represent 3D scenes in a high-resolution, continuous manner without needing explicit 3D models like meshes or voxels. In the context of SLAM, this capability is highly advantageous because it allows for:

      • Compact scene encoding: Neural networks compress the scene data into a small, continuous model.
      • Smooth interpolation: Scenes can be smoothly reconstructed at any viewpoint, avoiding the discretization issues that arise in traditional 3D SLAM techniques.
      • Less reliance on depth sensors: NICER-SLAM performs SLAM operations with RGB cameras, reducing hardware complexity.

      How Does NICER-SLAM Work?

      NICER-SLAM consists of a two-stage architecture: scene representation and pose estimation. Below is a breakdown of the key components of its pipeline:

      1. Scene Representation via Neural Fields

      In NICER-SLAM, the scene is encoded using Neural Radiance Fields (NeRFs), a popular method for implicit scene representation. NeRFs represent scenes as a continuous volumetric field, where every point in space is associated with a radiance value (the color) and density (how much light the point blocks).

      • NeRF-based Scene Model: NICER-SLAM trains a neural network to predict the color and density of points in 3D space, given a viewpoint .
      • Optimization: The network is optimized by minimizing the photometric error between the actual image captured by the camera and the rendered image generated by the model. This allows for reconstructing a high-fidelity 3D model from RGB data .

      2. Pose Estimation and Tracking

      To achieve localization, NICER-SLAM tracks the camera’s pose by continuously adjusting the position of the camera with respect to the environment. It employs a learning-based pose estimation network, which uses the encoded scene and the camera images to predict accurate camera poses .

      • Pose Optimization: The camera pose is iteratively refined by minimizing the error between the projected 3D model and the observed RGB frames, ensuring precise tracking even in challenging environments .
      • Differentiable Rendering: The system uses differentiable rendering to compute the gradients that guide the optimization of the scene representation and pose estimation together .

      Architecture Overview

      NICER-SLAM architecture

      The architecture of NICER-SLAM can be divided into three primary sections:

      1. Neural Implicit Encoder: The neural network that processes RGB inputs to encode the scene into a continuous neural field.
      2. NeRF Scene Representation: Neural Radiance Fields that store the scene in a compact and implicit form.
      3. Pose Estimator: A learning-based network that estimates the camera pose by leveraging both scene geometry and RGB image information .

      Rendering Results

      Below is an example of the rendering results obtained using NICER-SLAM. These results showcase the system’s ability to create detailed, high-resolution maps of the environment with smooth surfaces and accurate color reproduction.

      Rendering result on the self-captured outdoor dataset

      Applications of NICER-SLAM

      The innovations brought by NICER-SLAM open up possibilities in several fields:

      • Autonomous Robotics: Robots can perform high-precision navigation and mapping in unknown environments without relying on depth sensors .
      • Augmented Reality (AR): NICER-SLAM can enable smoother and more accurate scene reconstructions for AR applications, improving the visual fidelity and interaction with virtual objects .
      • 3D Reconstruction: NICER-SLAM’s ability to create dense and continuous 3D maps makes it a strong candidate for tasks in 3D scanning and modeling .

      References:

      1. Wang, N., et al. (2021). NICER-SLAM: Neural Implicit Scene Encoding for RGB SLAM. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). arXiv preprint.
      2. Mildenhall, B., et al. (2020). NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Proceedings of the European Conference on Computer Vision (ECCV). arXiv preprint.
      3. Zhu, Z., et al. (2021). Learning-Based SLAM: Progress, Applications, and Future Directions. IEEE Robotics and Automation Magazine. arXiv preprint.
      4. Klein, G., & Murray, D. (2007). Parallel Tracking and Mapping for Small AR Workspaces. IEEE/ACM Transactions on Graphics (TOG). arXiv preprint.