Author: kris.chebo

What it Takes to Get a SLAM Dunk, Part II

Post author By kris.chebo
Post date August 10, 2024

This is the 2nd part of the two-part post series, prepared by Krishna and I. Krishna presented an overview of SLAM systems in a very intuitive and engaging way. In this part, I explore the future of SLAM systems in endoscopy and how our team plans to shape it.

Collaborators: Krishna Chebolu

Introduction

What about the future of SLAM endoscopy systems? To get an insight on where research is heading, we must first discuss the challenges posed by the task to localise an agent and map such a difficult environment, as well as the weaknesses of current systems.

On one hand, the environment of the inside of the human body, coupled with data/device heterogeneity and image quality issues, significantly hinder the performance of endoscopy SLAM systems [1], [2] due to:

1) Texture scarceness, scale ambiguity

2) Illumination variation

3) Bodies (foreign or not), fluids and their movement (e.g., mucus, mucosal movement)

4) Deformable tissues and occlusions

5) Scope-related issues (e.g., imagery quality variability)

6) Underlining scene dynamics (e.g., imminent corruption of frames with severe artefacts, large organ motion and surface drifts)

7) Data heterogeneity (e.g., population diversity, rare or inconspicuous disease cases, variability in disease appearances from one organ to the other, endoscope variability)

8) Difference in device manufacturers

9) Input of experts being required for their reliable development

10) The organ preparation process

11) Additional imaging quality issues (e.g. free/non-uniform hand motions and organ movements, different image modalities)

12) Real time performance (speed and accuracy trade-off)

Current research of endoscopic SLAM systems mainly focuses on the first 3 of the aforementioned challenges; the state-of-the-art pipelines focus on understanding depth despite the lack of texture, as well as handling lighting changes and foreign bodies like mucus that can be reflective or move and, thus, skew the mapping reconstruction.

Images 1, 2, 3: The images above showcase the three main problems that skew the tissue structure understanding and hinder the performance of mapping of SLAM systems in endoscopy: (1) foreign bodies that are reflective (2) lighting variations and (3) lack of texture. Image credits: [3], [3], [4].

On the other hand, we must pinpoint where the weaknesses of such systems lie. The three main modules of AI endoscopy systems, that operate on image data, are Simultaneous Localization and Mapping (SLAM), Depth Estimation (DE) and Visual Odometry (VO); with the last two being submodules of the broader SLAM systems. SLAM is a computational method that enables a device to map its environment while simultaneously determining its own position within that map, which is often achieved via VO; a technique that estimates the camera’s position and trajectory by examining changes across a series of images. Depth estimation is the process of determining the distance between a camera and the objects in its view by analyzing visual information from one or more images, which is crucial for SLAM to accurately map the environment in three dimensions and understand its surroundings more effectively. Attempting to use general purpose SLAM systems on endoscopy data clearly shows that DE and map reconstruction are underperforming, while localisation/VO is sufficiently captured. This conclusion was reached based on initial experiments; however, further investigations are warranted.

Though the challenges and system weaknesses that current research aims to address are critical aspects of the models’ usability and performance, there is still a wide gap between the curated settings under which these models perform and real-world clinical settings. Clinical applications are still uncommon, due to the lack of holistic and representative datasets, in conjuction with limited participation of clinical experts. This leads to models that lack generalisability; widely used supervised techniques are data voracious and require many human annotations, which, apart from scarce, are often imperfect or overfitted to predominant samples in cohorts. Novel deep learning methods should be steered towards training on diverse endoscopic datasets, the introduction of explainability of results and the interpretability of models, which are required to accelerate this field. Finally, suitable evaluation metrics (i.e. generalisability assessments and robustness tests) should be defined to determine the strength of developed methods in regards to clinical translation.

For a future of advanced and applicable AI endoscopy systems, the directions are clear, as discussed in [1]:

1) Endoscopy-specific solutions must be developed, rather than just applying pipelines from the computer vision field

2) Robustness and generalisation evaluation metrics of the developed solutions must be defined to set the standard to assess and compare model performance

3) Practicability, compactness and real time effectiveness should also be quantified

4) More challenging problems should be explored (subtle lesions instead of apparent lesions)

5) The developed models should be able to adapt to datasets produced in different clinics, using different endoscopes, in the context of varying manifestations of diseases

6) Multi-modal and multi-scale integration of data should be incorporated in these systems

7) Clinical validation is necessary to steadily integrate these systems in the clinical process

Method

But how do we envision the future of SLAM endoscopy systems?

Our team aims to address directly the issues of texture scarceness, illumination variation and handling of foreign bodies, while indirectly combating some of the rest of the challenges. Building upon state-of-the-art SLAM systems, which already handle localisation/VO sufficiently, we aim to further enhance their mapping process, by integrating a state-of-the-art endoscopy monocular depth estimation pipeline [3] and by developing a module to understand lighting variations in the context of endoscopic image analysis. The aforementioned module will have a corrective nature, automatically adjusting the lighting in the captured images to ensure that the visuals are clear and consistent. Potentially, it could also enhance the image quality by adjusting brightness, contrast, and other image parameters in real-time, standardizing the images of different frames of the endoscopy video. As the module’s task is to improve the visibility and consistency of the image features, it would consequentially also support the depth estimation process, by providing clearer cues and contrast for accurate depth calculations by the endoscopy monocular depth estimation pipeline. Thus, the module would ensure a more consistent and refined input to the SLAM model, rather than raw endoscopy data, which suffer from inconsistencies and heterogeneities, never seen before by the model. With the aforementioned integrations we aim to develop a specialised SLAM endoscopy system and test it in the context of clinical colonoscopy [5]. Ideally, the plan is to first train and test our pipeline on a curated dataset to test its performance under controlled settings and then it would be of great interest to adjust each part of the pipeline to make it perform on real-world clinical data or across multiple datasets. This will provide us with the opportunity to see where a state-of-the-art SLAM endoscopy system stands in the context of real-world applicability and help quantify and address the issues explored in the previous section.

Image 4: State-of-the-art clinical mesh reconstruction using the endoscopy monocular depth estimation pipeline [5].

Image 5: The endoscopy monocular depth estimation pipeline also extracts state-of-the-art depth estimation in endoscopy videos.

Colonoscopy Data

The type of endoscopy procedure we choose to develop our pipeline for is colonoscopy; a medical procedure that uses a flexible fibre-optic instrument equipped with a camera and light (a colonoscope) to examine the interior of the colon and rectum. More specifically, we select to work with the Colonoscopy 3D Video Dataset (C3VD) [5]. The significance of this dataset study is the fact that it provides high quality ground truth data, obtained using a high-definition clinical colonoscope and high-fidelity colon models, creating a benchmark for computer vision pipelines. The study introduced a novel multimodal 2D-3D registration technique to register optical video sequences with ground truth rendered views of a known 3D model.

Video 1: C3VD dataset: Data from the colonoscopy camera (left) and depth estimation (right) extracted by a Generative Adversarial Network (GAN). Video credits: [5]

Conclusion

SLAM systems are the state-of-the-art for localisation and mapping and endoscopy is the gold standard procedure for many hollow organs. Combining the two, we get a powerful medical tool that can not only improve patient care, but also be life-defining in some cases. Its use cases can be prognostic, diagnostic, monitoring and even therapeutic, ranging from, but not limited to: disease surveillance, inflammation monitoring, early cancer detection, tumour characterisation, resection procedures, minimally invasive treatment interventions and therapeutic response monitoring. With the development of SLAM endoscopy systems, the endoscopy surgeon has acquired a visual overview of various environments inside the human body, that would otherwise be impossible. Endoscopy being highly operator-dependent with grim clinical outcomes in some disease cases, makes reliable and accurate automated system guidance imperative. Thus, in recent years, there has been a significant increase in the publication of endoscopic imaging-based methods within the fields of computer-aided detection (CADe), computer-aided diagnosis (CADx) and computer-assisted surgery (CAS). In the future, most designed methods must be more generalisable to unseen noisy data, patient population variability and variable disease appearances, giving an answer to the multi-faceted challenges that the latest models fail to address, under actual clinical settings.

This post concludes part 11 of What it Takes to Get a SLAM Dunk.

Image 6: Michael Jordan (considered by me and many as the G.O.A.T.) performing his most famous dunk. Image credits: ScienceABC

References

[1] Ali, S. Where do we stand in AI for endoscopic image analysis? Deciphering gaps and future directions. npj Digit. Med. 5, 184 (2022). https://doi.org/10.1038/s41746-022-00733-3

[2] Ali, S., Zhou, F., Braden, B. et al. An objective comparison of detection and segmentation algorithms for artefacts in clinical endoscopy. Sci Rep 10, 2748 (2020). https://doi.org/10.1038/s41598-020-59413-5

[3] Paruchuri, A., Ehrenstein, S., Wang, S., Fried, I., Pizer, S. M., Niethammer, M., & Sengupta, R. Leveraging near-field lighting for monocular depth estimation from endoscopy videos. In Proceedings of the European Conference on Computer Vision (ECCV), (2024). https://doi.org/10.48550/arXiv.2403.17915

[4] https://commons.wikimedia.org/wiki/File:Stomach_endoscopy_1.jpg

[5] Bobrow, T. L., Golhar, M., Vijayan, R., Akshintala, V. S., Garcia, J. R., & Durr, N. J. Colonoscopy 3D video dataset with paired depth from 2D-3D registration. Medical Image Analysis, 102956 (2023). https://doi.org/10.48550/arXiv.2206.08903

Post author By kris.chebo
Post date August 10, 2024
Tags Endoscopy, Endoscopy Image Standardisation, Future, Monocular Depth Estimation, SLAM

Research

What it Takes to Get a SLAM Dunk, Part I

Post author By kris.chebo
Post date August 10, 2024

In this two-part post series, Nicolas and I dive deeper into SLAM systems– our project’s focus for the past two weeks. In this part, I introduce and cover the evolution of SLAM systems. In the next part, Nicolas harnesses our interest by discussing the future. By the end of both parts, we should be able to give you an overview of What it Takes to Get a SLAM Dunk.

Collaborators: Nicolas Pigadas

Introduction

Simultaneous Localization and Mapping (SLAM) systems have become a standard in various technological fields, from autonomous robotics to augmented reality. However, in recent years, this technology has found a particularly unique application in medical imaging– in endoscopic videos. But what is SLAM?

Figure 1: A sample image using SLAM reconstruction from SG News Desk.

SLAM systems were conceptualized in robotics and computer vision for navigation purposes. Before SLAM, the fields employed more elementary methods,

Localization: the ability of a system (a robot, say) to determine its position within a known environment. This can be done using odometry, landmark-based localization, and dead reckoning.
Mapping: the process of creating a representation of an environment, typically in the form of a 2D or 3D map. This was done using grid-based and feature-based methods.

Figure 2: Example of large-scale 3D semantic mapping by a vehicle.

You may be thinking, Krishna, you just described SLAM systems, it sounds like. You are right, but the localizing and mapping were separate processes. So a robot would go through the pains of the Heisenberg principle, i.e., the robot would either localize or map– the or is exclusionary.

It was fairly obvious, but still daunting what the next step in research would be. Before we SLAM dunk our basketball, we must do a few lay-ups and free-throw shoots first.

Precursors to SLAM

Here are some inspirations that contributed to the development of SLAM

Probabilistic robotics: The introduction of probabilistic approaches, such as Bayesian filtering, allowed robots to estimate their position and map the environment with a degree of uncertainty, paving the way for more integrated systems.
Kalman filtering: a mathematical technique for estimating the state of a dynamic system. It allowed for continuous estimation of a robot’s position and could be invariant to noisy sensor data.
Cognitive Mapping in Animals: Research in cognitive science and animal navigation provided theoretical inspiration, particularly the idea that animals build mental maps of their environment while simultaneously keeping track of their location.

Figure 3: Spatial behavior and cognitive mapping of mice with aging. Image from Nature.

SLAM Dunk – A Culmination (some real Vince Carter stuff)

Finally, many researchers agreed that the separation of localizing and mapping was ineffective, and great efforts went into their integration. SLAM was developed. The goal was to enable systems to explore and understand an unknown environment autonomously, they needed to localize and map the environment simultaneously, with each task informing and improving the other.

With its unique ability to localize and map, researchers found SLAM’s use in any sensory device. Some of SLAM’s earlier use were sensor-based; so data would be inputted from range finders, sonar, and LIDAR; in the late 80s and early 90s. It is good to note that the algorithms were computationally intensive– and still are.

As technology evolved, a vision-based SLAM emerged. This shift was inspired by the human visual system, which navigates the world primarily through sight, enabling more natural and flexible mapping techniques.

Key Milestones

With the latest iterations of SLAM being exponentially better than the origin, it is important to recognize the journey. Here are notable SLAM systems:

EKF-SLAM (Extended Kalman Filter SLAM): One of the earliest and most influential SLAM algorithms, EKF-SLAM, laid the foundation for probabilistic approaches to SLAM, allowing for more accurate mapping and localization.
FastSLAM: Introduced in the early 2000s, FastSLAM utilized particle filters, making it more efficient and scalable. This development was crucial in enabling real-time SLAM applications.
Visual SLAM: The transition to vision-based SLAM in the mid-2000s opened new possibilities for the technology. Visual SLAM systems, such as PTAM (Parallel Tracking and Mapping), enabled more detailed and accurate mapping using standard cameras, a significant step toward broader applications.

Figure 4: Left LSD-SLAM, right ORB-SLAM. Image found in fzheng.me

From Robotics to Endoscopy (Medical Vision)

As SLAM technology matured, researchers explored its potential beyond traditional robotics. Medical imaging, particularly endoscopy, presented a fantastic opportunity for SLAM. Endoscopy is a medical procedure involving a flexible tube with a camera to visualize the body’s interior, often within complex and dynamic environments like the gastrointestinal tract.

Figure 5: Endoscopy procedure overview. Image from John Hopkins Medicine.

It is fairly trivial why SLAM could be applied to endoscopic and endoscopy-like procedures to gain insights and make more medically informed decisions. Early work focused on using visual SLAM to navigate the gastrointestinal tract, where the narrow and deformable environment presented significant challenges.

One of the first successful implementations involved using SLAM to reconstruct 3D maps of the colon during colonoscopy procedures. This approach improved navigation accuracy and provided valuable information for diagnosing conditions like polyps or tumors.

Researchers also explored the integration of SLAM with other technologies, such as optical coherence tomography (OCT) and ultrasound, to enhance the quality of the maps and provide additional layers of information. These efforts laid the groundwork for more advanced SLAM systems capable of handling the complexities of real-time endoscopic navigation.

Figure 6: Visual of Optical Coherence Tomography from News-Medical.

Endoscopy SLAMs – What Our Group Looked At

As a part of our study, we looked at some presently used and state-of-the-art SLAM systems. Below are the three that various members of our team attempted:

NICER-SLAM (RGB): a dense RGB SLAM system that simultaneously optimizes for camera poses and a hierarchical neural implicit map representation, which also allows for high-quality novel view synthesis.
ORB3-SLAM (RBG): (there is also ORB1 and ORB2) ORB-SLAM3 is the first real-time SLAM library able to perform Visual, Visual-Inertial, and Multi-Map SLAM with monocular, stereo, and RGB-D cameras, using pin-hole and fisheye lens models. In all sensor configurations, ORB-SLAM3 is as robust as the best systems available in the literature and significantly more accurate.
DROID-SLAM (RBG): a new deep learning-based SLAM system. DROID-SLAM consists of recurrent iterative updates of camera pose and pixel-wise depth through a Dense Bundle Adjustment layer.

Figure 7: Demo pictures from Gaussian Splatting SLAM.

Some other SLAM systems that our team would have loved to try our hand at are:

Gaussian Splatting SLAM: first application of 3D Gaussian Splatting in monocular SLAM, the most fundamental but the hardest setup for Visual SLAM.
GlORIE-SLAM: Globally Optimized RGB-only Implicit Encoding Point Cloud SLAM. This system uses a deformable point cloud as the scene representation and achieves lower trajectory error and higher rendering accuracy compared to competitive approaches.

More SLAM methods can be found in this survey.

Conclusion

This concludes part 1 of What it Takes to Get a SLAM Dunk. This post should have given you a gentle, but robust-enough introduction to SLAM systems. Vince Carter might even approve.

Figure 9: An homage to Vince Carter, arguably the greatest dunk-er ever. Image from Bleacher Report.

Post author By kris.chebo
Post date August 10, 2024
Tags localization, mapping, reconstruction, sgi2024, SLAM

Math Research

A Deeper Understanding OpenAI’s CLIP Model

Post author By kris.chebo
Post date July 28, 2024

Author: Krishna Chebolu
Teammates: Bethlehem Tassew and Kimberly Herrera
Mentor: Dr. Ankita Shukla

Introduction

For the past two weeks, our project team mentored by Dr. Ankita Shukla set out to understand the inner workings of OpenAI’s CLIP model. Specifically, we were interested in gaining a mathematical understanding of feature spaces’ geometric and topological properties.

OpenAI’s CLIP (Contrastive Language-Image Pre-Training) is a versatile and powerful model designed to understand and generate text and images. CLIP is trained to connect text and images by learning from a large dataset of images paired with their corresponding textual descriptions. The model is trained using a contrastive learning approach, where it learns to predict which text snippet is associated with which image from a set of possible pairs. This allows CLIP to understand the relationship between textual and visual information.

Figure 1: OpenAI’s CLIP architecture as it appears in the paper. CLIP pre-trains an image encoder and a text encoder to predict which images were paired with which texts in our dataset. Then OpenAI uses this behavior to turn CLIP into a zero-shot classifier. We convert all of a dataset’s classes into captions such as “a photo of a dog” and predict the class of the caption CLIP estimates best pairs with a given image.

CLIP uses two separate encoders: a text encoder (based on the Transformer architecture) and an image encoder (based on a convolutional neural network or a vision transformer). Both encoders produce embeddings in a shared latent space (also called a feature space). By aligning text and image embeddings in the same space, CLIP can perform tasks that require cross-modal understanding, such as image captioning, image classification with natural language labels, and more.

CLIP is trained on a vast dataset containing 400 million image-text pairs collected online. This extensive training data allows it to generalize across various domains and tasks. One of CLIP’s standout features is its ability to perform zero-shot learning. It can handle new tasks without requiring task-specific training data, simply by understanding the task description in natural language. More information can be found in OpenAI’s paper.

In our attempts to understand the inner workings of the feature spaces, we employed tools from UMAP, persistence homology, subspace angles, cosine similarity matrices, and Wasserstein distances.

Our Study – Methodology and Results

All of our teammates started with datasets that contained image-caption pairs. We classified images into various categories using their captions and embedded them using CLIP. Then we used UMAP or t-SNE plots to visualize their characteristics.

Figure 2: A screenshot of a UMAP of 1000 images from the Flickr 8k dataset from Kaggle divided into five categories (animal, human, sport, nature, and vehicle) is shown. Here we can also observe that the images (colored) are embedded differently than their corresponding captions (gray). Although not shown here, the captions are also clustered around categories.

After this preliminary visualization, we desire to delve deeper. We introduced noise, a Gaussian blur, to our images to test CLIP’s robustness. We added the noise in increments (for example mean = 0, standard deviation = {1,2,3,4,5}) and encoded them as we did the original image-caption pairs. We then made persistence diagrams using ripser. We also followed the same procedure within the various categories to understand how noise impacts not only the overall space but also their respective subspaces. These diagrams for the five categories from the Flickr 8k dataset can be found in this Google Colab notebook.

Figure 3: The same 1000 images from the Flickr 8k dataset with increasing noise are seen above. Visually, no significant difference can be observed. The standard deviation of the Gaussian blur increases from left to right.

Visually, you can observe that there is no significant difference, which attests to CLIP’s robustness. However, visual assurance is not enough. Thus, we used Scipy’s Wasserstein’s distance calculation to note how different each persistence diagram is from the other. Continuing the same Flickr 8k dataset, for each category, we obtain the values shown in Figure 4.

Figure 4: Wasserstein distances in each category. You can see the distance between original images with respect to images with Gaussian blur of std. dev = 1 is not high compared to std. dev = 2 or above. This implies that the original set is not as different from the set of images blurred with std. dev. = 1 as compared to std. dev. = 2 which in turn is not as different as the set of blurred images with std. dev. = 3, and so on. This property holds for all five categories.

Another question to understand is how similar are each of the categories to one another. This question can be answered by calculating the subspace angles. After embedding, each category can be seen as occupying a space that can often be far away from another category’s space– we want to quantify how far away, so we use subspace angles. Results for the Flickr 8k dataset example are shown in Figure 5.

Figure 5: Subspace angles of each category pair in the five categories introduced earlier in the post. All values are displayed in degrees. You can observe that the angle between human- and animal-category images is ~0.92° compared to human and nature: ~2.3°; which makes sense as humans and animals are more similar than humans compared to nature. It is worthwhile to note that the five categories are simplifying the dataset too much as they do not capture the nuances of the captions. More categories or descriptions of categories would lead to higher accuracy in the quantity of the subspace angle.

Conclusion

At the start, our team members were novices in the CLIP model, but we concluded as lesser novices. Through the two weeks, Dr. Shukla supported us and enabled us to understand the inner workings of the CLIP model. It is certainly thrilling to observe how AI around us is constantly evolving, but at the heart of it is mathematics governing the change. We are excited to possibly explore further and perform more analyses.

Post author By kris.chebo
Post date July 28, 2024

News

An SGI to SGI 2024

Post author By kris.chebo
Post date July 15, 2024

With week one wrapped up, the SGI 2024 fellows embark on a first adventure into their respective research projects. Today, most of the teams would have already had their first meeting. As we dive deeper into various topics, I wish to write a record of week one– our tutorial week.

Our first day, Monday, 8 July 2024, began with a warm welcome by Dr. Justin Solomon, SGI Director. Without wasting any time, we dove into the introductory material of geometry processing with the guidance of Dr. Oded Stein, who also served as the tutorial week chair. We then had a session on visualizing 3D data with Dr. Qingnan Zhou, a research engineer at Adobe Research.

It is one of the guiding philosophies of SGI that many of the fellows come from various backgrounds. I thought to myself, “not everyone will be at the same skill-level.” To my pleasant surprise, Dr. Stein’s material covered the absolute basics bringing everyone on the call to the same page, or rather presentation slide. The remaining four days followed the same principle, which is something I found admirable.

Our second day, slightly more complicated, was all about parameterization. The introduction was delivered by Richard Liu, a Ph.D. student at the University of Chicago, followed by a lecture on texture maps using PyTorch and texture optimization by Dale Decatur, also a Ph.D. student at UChicago. As a part of their lecture, Richard and Dale also assisted in setting up Jupyter notebooks to complete exercises– it was great help to those new to using such tools.

Since day two was slightly more complicated, there were many great, deep questions about the material. I want to point out the commendable job of the TAs and lecturers themselves on their quick turnaround, succinct answers, and vast resourcefulness. So many papers and articles were exchanged to enhance understanding!

Our third day was a day of realism. Soon-to-be faculty member at Columbia University, Silvia Sellán, covered how the academic community represents 2D- and 3D-shapes. Silvia emphasized the needs of the industry and academic community and offered the pros and cons of each representation. It all came down what a problem needs and what data is available to solve the problem. Silvia also offered a taste of alternate application to common methods and encouraged the 2024 fellows to do the same as they pursue research. The day ended with a guest appearance by Towaki Takikawa, a Ph.D. student from the University of Toronto. Towaki spoke about alternate geometry representations and neural fields with some live demos.

Day four dove deeper into geometry processing algorithms– a special focus on the level-of-detail methods. Which, having understood it, is really intuitive and neat thing to have in our toolbox! This material was taught by Derek Liu, a research scientist at Roblox Inc., and Eris Zhang, a Ph.D. student at Stanford. These talks were the most technical in the tutorial week. I think all the fellows appreciated Derek’s and Eris’s help toward the end of the day to stay back and assist everyone with the exercises.

Our last day was with Dr. Nicholas Sharp, a research scientist at NVIDIA who is also the author of a community-favorite software program, Polyscope. Dr. Sharp focused on what I think is the most important skill in all of programming: debugging. What if we get bad data? How do we deal with data that is beyond what is ideal? Day five was all about code-writing practices and debugging relating to these questions. While this day was technical, Dr. Sharp made it intuitive and easy-to-digest. We also had a complementary session by guest speaker Zachary Ferguson, a postdoc at MIT, on dealing with floating points in collision detection.

Five days of studying all kinds of new things. I, for one, am excited to work on my first project. With such a robust introduction, I feel more confident, as I am sure do others. Good luck to all the fellows, mentors, and volunteers!

Post author By kris.chebo
Post date July 15, 2024

News

The Symposium on Geometry Processing 2024 – A Recap

Post author By kris.chebo
Post date July 2, 2024

The Symposium on Geometry Processing (SGP) is an annual event highly awaited in the geometry community. Researchers, enthusiasts, and newcomers alike flew in all over the globe to attend and enjoy the experience put together by this year’s organizing committee. As an incoming Summer Geometry Initiative fellow, when the opportunity to receive a travel grant to attend SGP 2024 arose, I immediately applied. A few days later, I received an email stating that I was a recipient, and almost a month later, I landed in Boston to be among 150 fellow participants.

Like most years, this year, SGP took place in two-parts: the graduate school from June 22-23, 2024, and the technical symposium from June 24-26, 2024, in MIT’s CSAIL building in Cambridge, Massachusetts, USA.

As a novice into geometry processing, the graduate school was a particularly helpful and robust introduction into the field. It was packed with foundational material that focused on the intuitive understanding of how geometric data was represented and processed. The motivation and goals of geometry process were well-laid out. I particularly enjoyed the talks on equivariant neural networks, sketch processing, introduction to geometry processing research in Python, and Monte Carlo methods. The talks had a much-needed balance of introductory, interactive material and in-depth analysis of useful methods. During this time, I also met many other novices like myself as well as those who push the frontier of geometry.

Equipped with the basics, I felt more confident attending the main event—the symposium. My favorites, right-off-the-bat were the three keynote speakers. They accomplished the great feat of inspiring an audience mixed with experts and beginners, and bring them to the same page. Dr. Xin Tong, Dr. Alec Jacobson, and Dr. Josephine Carstensen each covered complementary avenues of research—the broader scope, future direction, and innovative applications. The remaining 15-16 papers presented at the symposium were impressive in their own rights. While the selected papers covered a range of material, they all had one admirable thing in common: the papers were obvious by-products of passionate research.

The social events at the event were also a unique experience. Speaking to the presenters, organizers, fellow student researchers, and industry folks in the historic scene of Boston Tea Party Ships and Museum and the innovative display of MIT’s science museum set this symposium apart from other math conferences that I previously attended. I enjoyed talking to the presenters and getting to know them during and after the symposium. I built meaningful connections with all credit going towards the conference organizers on facilitating such an interaction.

For me, the most interesting problem is how data is represented. As someone that worked in bioinformatics and Boeing’s Phantom Works Estimating, the quality of data is immensely important. A common thread is that data is computationally and financially expensive to acquire so current data is relied upon heavily. Dr. Tong’s talk emphasized the need for better data and the papers reinforced this idea when it came to future direction. I have always been interested in better data representation and optimization for pipeline processing, but now, I am equipped with better direction and motivation.

I am grateful to have been the recipient of the SGP 2024 Student Travel Grant. I believe that I exhausted its resources to the best of my ability. The five-day event left a mark of inspiration on me that I could not erase if I tried.

Post author By kris.chebo
Post date July 2, 2024

Collaborators: Nicolas Pigadas

Introduction

Precursors to SLAM

SLAM Dunk – A Culmination (some real Vince Carter stuff)

Key Milestones

From Robotics to Endoscopy (Medical Vision)

Endoscopy SLAMs – What Our Group Looked At

Conclusion

Author: Krishna CheboluTeammates: Bethlehem Tassew and Kimberly HerreraMentor: Dr. Ankita Shukla

Introduction

Our Study – Methodology and Results

Conclusion

Author: Krishna Chebolu
Teammates: Bethlehem Tassew and Kimberly Herrera
Mentor: Dr. Ankita Shukla