Figure 1: Comparison of distance estimation by heat methods. Standard diffuse-then-log (middle) suffers from numerical issues when the heat is small (large geodesic distances), leading to noisy and inaccurate isolines. Our proposed log-space diffusion (right) performs diffusion directly in the log-heat domain, producing smooth and accurate estimates, closely matching the ground-truth distances (left). It is worth noting that our solver outperforms others primarily for smaller time step values.
Introduction
The heat equation is one of the canonical diffusion PDEs: it models how “heat” (or any diffusive quantity) spreads out over space as time progresses. Mathematically, let \(u(x,t) \in \mathbb{R}_{\geq 0}\) denote the temperature at spatial location \(x \in \mathbb{R}^{d}\) and time \(t \in \mathbb{R}\). The continuous heat equation is given by
\[\frac{\partial u}{\partial t} = \Delta u,\]
where \(\Delta u (x, t) = \sum_{i=1}^{d} \frac{\partial^2 u}{\partial x_i^2}\) is the Laplace operator representing how local temperature gradients drive heat from hotter to colder regions. On a surface mesh, the spatial domain is typically discretized so that the PDE reduces to a system of ODEs in time, which can be integrated using stable and efficient schemes such as backward Euler. Interestingly, in many applications—including geodesic distance computation 1 and optimal transport 2—it is often more useful to work with the logarithm of the heat rather than the heat itself. For example, Varadhan’s formula states that the geodesic distance \(\phi\) between two points \(x\) and \(x^*\) on a curved domain can be recovered from the short-time asymptotics of the heat kernel:
where \(k_{t,x^{*}}(x) = u(x,t)\) is the heat distribution at time \(t\) resulting from an initial Dirac delta at \(x^*\), i.e., \(u(x,0) = \delta(x-x^{*})\). In practice, this leads to a two-step workflow: (1) simulate the heat equation to obtain \(u_t\), and (2) take the logarithm to recover the desired quantity. However, this approach can be numerically unstable in regions where \(u\) is very small: tiny relative errors in \(u\) can be greatly magnified in the log domain. For example, an absolute error of \(10^{-11}\) in \(u\) becomes an error of roughly \(25.33\) in \(\log u\).
A natural idea is therefore to evolve the logarithm of the heat directly. Let \(u>0\) satisfy the heat equation and define \(w = \log u\). One can show that \(w\) obeys the nonlinear PDE 3
\[\frac{\partial w}{\partial t} = \Delta w + \|\nabla w\|_2^2.\]
This formulation has the appealing property that it avoids taking the logarithm after simulation, thereby reducing the amplification of small relative errors in \(u\) when \(u\) is tiny. However, the price of this reformulation is the additional nonlinear term \(\|\nabla w\|_2^2\), which complicates time integration as each step requires iterative optimization, and introduces a discrepancy with the discretized heat equation. In practice, evaluating \(\|\nabla w\|_2^2\) requires a discrete gradient operator, and the resulting ODE is generally not equivalent to the one obtained by first discretizing the heat equation for \(u\) (which involves only a discrete Laplacian) up to a log transform. This non-equivalence stems from (1) there is a new spatially discretized gradient operator, and (2) the fact that the logarithm is nonlinear and does not commute with discrete differential operators.
Motivated by this discrepancy, we take a different approach that preserves the exact relationship to the standard discretization. We first discretize the heat equation in space, yielding the familiar linear ODE system for \(u\). We then apply the transformation \(w = \log u\) to this discrete system, producing a new ODE for \(w\). Now, by construction, the new ODE is equivalent to the heat ODE up to a log transform of variable. This still allows us to integrate directly in log-space, which allows more accurate and stable computation. We show that this formulation offers improved numerical robustness in regions where \(u\) is very small, and demonstrate its effectiveness in applications such as geodesic distance computation.
Table 1: Side-by-side procedures for computing \(\log u\) in heat diffusion, note the highlighted steps are the same for standard heat diffusion and ours.
Standard (post-log)
Continuous log-PDE
Proposed discrete-log
1. Start with the heat PDE for \(u\).
1. Start with the log-heat PDE for \(w=\log u\).
1. Start with the heat PDE for \(u\).
2. Discretize in space \(\Rightarrow\) ODE for \(u\).
2. Discretize in space for \(\nabla\) and \(\Delta\) \(\Rightarrow\) ODE for \(w\).
2. Discretize in space \(\Rightarrow\) ODE for \(u\).
3. Time-integrate in \(u\)-space.
3. Time-integrate in \(w\)-space and output \(w\).
3. Apply change of variables \(w=\log u\) \(\Rightarrow\) ODE for \(w\).
4. Compute \(w=\log u\) as a post-process.
4. Time-integrate in \(w\)-space and output \(w\).
Background
We begin by reviewing how previous approaches are carried out, focusing on the original heat method 1 and the log-heat PDE studied in 3. In particular, we discuss two key stages of the discretization process: (1) spatial discretization, where continuous differential operators such as the Laplacian and gradient are replaced by their discrete counterparts, turning the PDE into an ODE; and (2) time discretization, where a numerical time integrator is applied to solve the resulting ODE. Along the way, we introduce notation that will be used throughout.
Heat Method
Spatial discretization. In the heat method 1, the heat equation is discretized on triangle mesh surfaces by representing the heat values at mesh vertices as a vector \(u \in \mathbb{R}^{|V|}\), where \(|V|\) is the number of vertices. A corresponding discrete Laplace operator is then required to evolve \(u\) in time. Many Laplacian discretizations are possible, each with its own trade-offs 4. The widely used cotangent Laplacian is written in matrix form as \(M^{-1}L\), where \(M, L \in \mathbb{R}^{|V| \times |V|}\). The matrix \(L\) is defined via
where \(i,j\) are vertex indices and \(\alpha_{ij}, \beta_{ij}\) are the two angles opposite the edge \((i,j)\) in the mesh. The mass matrix \(M = \mathrm{diag}(m_1, \dots, m_{|V|})\) is a lumped (diagonal) area matrix, with \(m_i\) denoting the area associated with vertex \(i\). Note that \(L\) has several useful properties: it is sparse, symmetric, and each row sums to zero. This spatial discretization converts the continuous PDE into an ODE for the vector-valued heat function \(u(t)\):
\[\frac{\mathrm{d}u}{\mathrm{d}t} = M^{-1}L\,u.\]
Time discretization and integration. The ODE is typically advanced in time using a fully implicit backward Euler scheme, which is unconditionally stable. For a time step \(\Delta t = t_{n+1} – t_n\), backward Euler yields:
\[(M – \Delta t\,L)\,u_{n+1} = M\,u_n,\]
where \(u_n\) is given and we solve for \(u_{n+1}\). Thus, each time stepping requires solving a sparse linear system with system matrix \(M – \Delta t\,L\). Since this matrix remains constant throughout the simulation, it can be factored once (e.g., by Cholesky decomposition) and reused at every iteration, greatly improving efficiency.
Log Heat PDE
Spatial discretization. In prior work 3 on the log-heat PDE, solving this requires discretizing not only the Laplacian term but also the additional nonlinear term \(\|\nabla w\|_2^{2}\). After spatial discretization, the squared gradient norm of a vertex-based vector \(w \in \mathbb{R}^{|V|}\) is computed as
where \(G_{k} \in \mathbb{R}^{|F| \times |V|}\) is the discrete gradient operator on mesh faces for the \(k\)-th coordinate direction, and \(W \in \mathbb{R}^{|V| \times |F|}\) is an averaging operator that maps face values to vertex values. With this discretization, the ODE for \(w(t) \in \mathbb{R}^{|V|}\) becomes
Time discretization and integration. If we apply fully implicit Euler, we would find \(w^{n+1}\) by solving a nonlinear equation. To avoid this large nonlinear solve, prior work 3 splits the problem into two substeps:
a linear diffusion part: \(\frac{\mathrm{d}w}{\mathrm{d}t} = M^{-1}L\,w\), which is stiff but easy to solve implicitly,
a nonlinear part: the full equation, which is nonlinear but convex.
This separation lets each component be treated with the method best suited for it: a direct linear solve for the diffusion, and a convex optimization problem for the nonlinear term. Using Strang-Marchuk splitting:
Full-step nonlinear term: Solve a convex optimization problem to find \(w^{(2)}\)
Half-step diffusion: \((M – \frac{\Delta t}{2} L)\,w^{(3)} = Mw^{(2)}\), and set \(w^{n+1} = w^{(3)}\)
A New ODE in Log-Domain
Let us revisit the spatially discretized heat (or diffusion) equation. For each vertex \(i \in \{1, \dots |V|\}\) on a mesh, its evolution is described by
where \(u\) represents our heat, and \(L\) is the discrete Laplacian operator. A key insight is that the Laplacian term can be rewritten as a sum of local differences between a point and its neighbors:
\[(Lu)_i = \sum_{j \sim i}L_{ij} (u_j – u_i).\]
This shows that the change in \(u_i\) depends on how different it is from its neighbors, weighted by the Laplacian matrix entries. Now, before doing time discretization, we use a change of variable \(w \Leftarrow\log u\), accordingly, \(u = \mathrm{e}^{w}\), to obtain the ODE
In vectorized notation, we construct a matrix \(\tilde{L} \Leftarrow L – \mathrm{diag}(L_{11}, \dots, L_{|V| \,|V|})\), which zeros out the diagonals of \(L\). Then the log heat \(w(t) \in \mathbb{R}^{|V|}\) follows the ODE
This ODE is equivalent to the standard heat ODE up to a log transform of the integrated results, but crucially it avoids the amplification of small relative errors in \(u\) when \(u\) is tiny by integrating directly in log-space.
Numerical Solvers for Our New ODE
Forward (Explicit Euler)
The most straightforward approach uses explicit Euler to update the solution at each time step. The update rule is
It can be shown that minimizing the objective function
\[f(w) = \sum_{i}M_iw_i\]
with the inequality constraint defined above yields the exact solution to the original equation. We have written a complete proof for this finding here. We implement this constrained optimization problem using the Clarabel solver 6 in CVXPY 7, a Python library for convex optimization problems.
Figure 2: Visualization of distance computations on the mesh by different solvers. Ground Truth, HeatBackward, LogHeatBackward, NewLogHeatBackward (from left to right)
Experiments
As a common application, we demonstrate the effectiveness of log heat methods for computing fast approximations of geodesic distances. As ground truth, we used libigl’s 5 implementation of exact discrete geodesic distances. We used the mesh of a kitten with 10 diffusion time-steps equally spaced between \(t=0\) and \(t=10^{-3}\).
Figure 3: Correlation plot for log heat methods against an exact geodesic computation.
Table 2: Correlation analysis results.
Mesh
# of verts
Method
Pearson \(r\) ↑
RMSE ↓
MAE ↓
Cat
9447
HeatBackward
0.986
0.285
0.241
LogHeatBackward
-0.183
0.577
0.522
NewLogHeatBackward (ours)
0.879
0.163
0.112
The flat region for the NewLogHeatBackward Solver probably indicates regions of numerical overflow/underflow, since the log of tiny heat values is going to be incredibly negative. Through our experimentation we observed that the solver for this our formulated PDE outperforms others mostly in cases where the diffusion time-step is extremely small.
References
[1] Keenan Crane, Clarisse Weischedel, and Max Wardetzky. “The Heat Method for Distance Computation.” Communications of the ACM, vol. 60, no. 11, October 2017, pp. 90–99.
[2] Justin Solomon, Fernando de Goes, Gabriel Peyré, Marco Cuturi, Adrian Butscher, Andy Nguyen, Tao Du, and Leonidas Guibas. “Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains.” ACM Transactions on Graphics, vol. 34, no. 4, August 2015.
[3] Leticia Mattos Da Silva, Oded Stein, and Justin Solomon. “A Framework for Solving Parabolic Partial Differential Equations on Discrete Domains.” ACM Transactions on Graphics, vol. 43, no. 5, October 2024.
[4] Max Wardetzky, Saurabh Mathur, Felix Kälberer, and Eitan Grinspun. “Discrete Laplace Operators: No Free Lunch.” In Proceedings of the Fifth Eurographics Symposium on Geometry Processing, pages 33–37, Barcelona, Spain, 2007.
[5] Alec Jacobson et al. “libigl: A Simple C++ Geometry Processing Library.” Available at https://github.com/libigl/libigl, 2025.
[6] Paul J. Goulart and Yuwen Chen. “Clarabel: An Interior-Point Solver for Conic Programs with Quadratic Objectives.” arXiv preprint arXiv:2405.12762, 2024.
[7] Steven Diamond and Stephen Boyd. “CVXPY: A Python-Embedded Modeling Language for Convex Optimization.” Journal of Machine Learning Research, vol. 17, no. 83, pages 1–5, 2016.
Figure 1. Our method reduces the variance of the Walk-on-Spheres algorithm with minimal computational overhead. We compute a harmonic control variate, apply Walk-on-Spheres to the residual, and sum the results to obtain the final solution. With 256 walks, our approach (left) achieves a substantial reduction in RMSE compared to the standard method (right).
1 Introduction
A classic problem in computer graphics is the interpolation of boundary values into a region’s interior. More formally, given a domain \( \Omega \) and values prescribed on its boundary \( \partial \Omega \), we seek a smooth function \( u: \mathbb{R}^n \to \mathbb{R} \) that satisfies the boundary value problem
$$\begin{align} \Delta u(x) = 0 \text{ in } \Omega, \ \quad u(x) = g(x) \text{ on } \partial \Omega, \end{align}$$
where \(g(x)\) is a given boundary function. This is the well-known Laplace’s equation with Dirichlet boundary conditions. A standard approach for solving such PDEs is the finite element method (FEM), which discretizes the domain into a mesh and computes a weak solution. However, the accuracy of FEM depends critically on the quality of the mesh and how well it approximates the domain geometry. In contrast, Monte Carlo methods offer an appealing alternative — particularly the walk on spheres (WoS) algorithm [Muller, 1956], which avoids meshing altogether. WoS exploits the Mean Value Property of harmonic functions (i.e., functions satisfying \( \Delta u = 0) \), which states:
for any ball \( B(x, r) \subseteq \Omega \). This property implies that \( u(x) \) can be estimated by recursively sampling points on the boundary of spheres centered at \( x \), continuing until a point lands on \( \partial \Omega \), where the value is known from \( g(x) \). Compared to FEM, WoS scales much better with geometric complexity and does not require certain properties of the mesh like watertightness, free of self-intersections, manifoldness, etc. While WoS does not generalize to all classes of PDEs, it offers many advantages even beyond geometric robustness (See Sawhney and Crane 2020, Section 1).
While WoS is an unbiased estimator and is largely agnostic to geometric quality, it can suffer from high variance, often requiring many samples to converge and thus incurring high computational cost. A common way to reduce variance is to use a control variate, and prior work has explored applying them to WoS. The main challenge in using control variates is finding a suitable reference function (one with a known integral) that also closely matches the target function. One research direction to address this challenge is neural control variates—constructing the control variate function using neural networks (See Li et al. 2024, who applied this approach to WoS). While their method leverages a clever technique known as automatic integration [Li et al. 2024] to perform closed-form integration, it still suffers from substantial overhead during both training and inference due to the use of MLPs. In contrast, we propose harmonic control variates, where harmonicity is built into the control variate representation by design, enabling closed-form integration directly as a function evaluation. Moreover, our representation is fitted using a simple least squares procedure, which is significantly more efficient than training a neural network via gradient descent. It also allows for much faster evaluation at inference time.
Specifically, our approach consists of two main steps. (1) We construct a harmonic representation as a weighted sum of harmonic kernels (e.g., Green’s functions), and fit the weights/coefficients using a sparse set of points in the domain along with their WoS-based estimates. This fitting reduces to solving a linear system of the form \( \Phi \sigma = u \) for the weight vector \( \sigma \). The resulting function is harmonic by construction, and although it provides a useful denoised estimate, it remains biased with respect to the true solution \( u(x) \) almost everywhere in the domain. (2) To correct for this bias, we use the harmonic representation as a control variate, thus termed harmonic control variate. Because the representation is harmonic, its mean over any sphere can be computed in closed form via simple function evaluation. We then define a residual Laplace problem, where the boundary condition is the difference between the true boundary data and our representation. Solving this residual problem with WoS and adding the result back to the harmonic representation yields an unbiased estimator for \( u(x) \). This control variate estimator reduces variance whenever the representation closely approximates \( u(x) \) on the boundary.
Our harmonic control variate achieves significant variance reduction compared to naive WoS across a range of geometries, yielding an average 40% reduction in RMSE compared to the naive baseline at 128 walks. In addition to its effectiveness, our method is simple to implement: the entire pipeline can be reproduced in Python with minimal code, leveraging just a few function calls to the underlying integrated C++ library Zombie [Sawhney and Miller 2023]. To illustrate its flexibility, we also extend our approach to settings with Neumann boundary conditions and with an alternative Monte Carlo scheme Walk on Stars [Sawhney, Miller, et al. 2023]. We believe that this classically inspired yet practical control variate representation (a sum of harmonic kernels) offers useful insights for the broader computational science and graphics communities.
Figure 2. Finding a suitable control variate function, ours vs. MLP
2 Background
2.1 Walk on Spheres
The Walk on Spheres (WoS) algorithm [Muller 1956] can be motivated in two complementary ways. One is via Kakutani’s principle, which states that the solution value \( u(x) \) of a harmonic function at any point \( x \in \Omega \) is equal to the expected boundary value \( u(x) = \mathbb{E}[g(y)] \), where \( y \in \partial \Omega \) is the first boundary point reached by a Brownian motion starting at \( x \). We focus on a second, more constructive motivation coming from the mean value property of harmonic functions (see eq. (*)). This property immediately suggests a single-sample Monte Carlo estimator:
where \( \mathscr{U}(\partial B(x,r)) \) denotes the uniform distribution on the sphere of radius \( r \) centered at \( x \). Specifically, the mean value property suggests that this estimator is unbiased:
where \( p(y) = 1/|\partial B(x,r)| \) denotes the uniform probability density on \( \partial B(x, r) \).
However, the difficulty is that \( u(y) \) is unknown unless \( y \) lies on the boundary \( \partial \Omega \). To address this, the mean value property is applied recursively: each sampled point becomes the new center for the next sphere, and the process continues until the boundary is reached. Since naïve recursion would never terminate, the algorithm introduces a tolerance \( \epsilon \). Specifically, once a point \( x_k \) lies within distance \( \epsilon \) of the boundary, we project it to the nearest boundary point \( \bar{x}_k \in \partial \Omega \) and evaluate the boundary condition \( g(\bar{x}_k) \). This serves as the base case of the recursion.
Formally, letting \( \Omega_\epsilon \) denote the \( \epsilon\)-shell adjacent to the boundary (illustrated in Figure 3), the recursive rule is
with the next sample \( x_{k+1} \sim \mathscr{U}(\partial B(x_k, r_k)) \), where \( r_k \) is the largest radius such that the ball remains fully contained in \( \Omega \). As the recursion telescopes, the final estimator is simply
where each \( \bar{x}_{n_i}^{(i)} \in \partial \Omega \) is the endpoint of the \( i \)-th walk, reached on its \( n_i \)-th step. This estimator remains unbiased as \( \epsilon \to 0 \), but its efficiency is governed by the variance of the boundary condition \( g(x) \). In fact, the variance of the estimator is directly proportional to that of \( g(x) \):
Intuitively, if \( g(x) \) varies significantly across the boundary region where walks terminate, the estimates \( \langle u(x_0) \rangle \) will fluctuate sharply between different exit points, leading to a noisy approximation of the solution. Conversely, if \( g(x) \) is nearly constant around the relevant boundary region, most walks yield similar values, and the estimator stabilizes. This observation is central to our two-pass variance-reduction strategy (see Section 3.2).
2.2 Control Variates
To build intuition, let us first consider the simple task of integrating a scalar function in one dimension. Suppose we want to estimate
$$\begin{align} F = \int_{a}^{b} f(x)\, \mathrm{d}x, \end{align}$$
using Monte Carlo (MC) sampling. A naive single-sample Monte Carlo estimator is
$$\begin{align} \langle F \rangle = |b-a|\, f(x), \quad x \sim \mathscr{U}[a,b]. \end{align}$$
The idea of a control variate is to introduce a function (c(x)) that (i) closely approximates (f(x)) and (ii) has a known closed-form integral, denoted (C). We can rewrite the target integral as
This yields a single sample Monte-Carlo estimator with control variate:
$$\begin{align} \langle F \rangle^{(\text{cv})} = |b-a| \big(f(x) – c(x)\big) + C, \quad x \sim \mathscr{U}[a,b]. \end{align}$$
It is clear that it remains unbiased. Moreover, it achieves lower variance whenever the residual \( r(x) := f(x) – c(x) \) is less variable than the original function \( f(x) \):
$$\begin{align} \mathrm{Var} [\langle F \rangle^{(\text{cv})} ] < \mathrm{Var}[\langle F \rangle] \quad \iff \quad \mathrm{Var}[f(x) – c(x)] < \mathrm{Var}[f(x)]. \end{align}$$
Figure 4. An illustration of control variate reducing Monte Carlo estimation variance in 1D.
Control Variates for Walk-on-Spheres. Motivated by this principle, prior work [Li et al. 2024] incorporated control variates directly into the mean value formulation of WoS. At each step of the walk, they replace the boundary condition with a residual correction, and by eq. (*),
where \( C(x) = \frac{1}{\lvert \partial B(x,r)\rvert}\int_{\partial B(x,r)} c(y)\,\mathrm{d}y \) and \( c : \mathbb{R}^d \to \mathbb{R} \) is the control variate. This introduces the challenge of designing a control variate \(c\) that is both expressive and admits a tractable integral \(C(x)\). Li et al. [2024] addressed this by parameterizing \( c \) with a neural network and using automatic integration to evaluate \(C(x)\). However, this approach requires costly neural network evaluations at every step of the walk and additional training via gradient descent. In contrast, we take the insight that if \( c \) is also harmonic, then \( C(x) = c(x) \) by eq. (*). Our method only evaluates the control variate twice per walk and fits it directly via least squares, making it significantly more efficient while still achieving variance reduction.
3 Method
Our method introduces a harmonic representation that serves as a control variate for variance reduction. We first describe the formulation of this harmonic representation and how it can be efficiently solved in Section 3.1. Next, we explain how this representation functions as a control variate to reduce variance in Monte Carlo estimation in Section 3.2. Finally, we present implementation details in Section 3.3. An overview of our method is illustrated in Figure 5.
3.1 Mixture of Harmonic Kernels Representation
We propose a simple, kernel-based representation for approximating a target harmonic function \( u(x) \), subject to Dirichlet boundary conditions \( u(x) = g(x) \) on \( \partial \Omega \). Specifically, we approximate \( u(x) \) using a mixture of harmonic kernels \( \phi_j(\,\cdot\,) \):
where each \( \phi_j(x) \) is itself harmonic. By linearity, the mixture \( c(x) \) is also harmonic. To ensure that \( c(x) \) remains free of singularities inside the domain \( \Omega \), the kernels are chosen carefully. The coefficients \( \{\sigma_j\}_{j=1}^M \) are free parameters that must be fitted to approximate \( u(x) \).
There are many possible choices of harmonic kernels. A canonical example in two dimensions is the Green’s function,
where the centers \( \{y_j\} \) are placed outside of \( \Omega \) to avoid singularities within the domain. We will introduce another kernel choice in Section 4.
Fitting the coefficients: To determine the coefficients \( \sigma_j \), we first sample a sparse set of interior points \( \{x_i\}_{i=1}^N \subset \Omega \). At each point, we estimate \( \langle u(x_i) \rangle \) using Walk-on-Spheres (WoS), which leverages the known boundary condition. This produces training tuples \( \{(x_i, \langle u(x_i) \rangle) \}_{i=1}^N \), where \( c(x_i) \) should approximate \( \langle u(x_i) \rangle \). Each tuple gives a linear constraint on \( \sigma_j \), leading to the linear system:
such that we can introduce some regularization terms (see Section 3.3). An example of this fitting procedure using the Green’s kernel is illustrated in Figure 6.
Figure 5. An overview of our method.
Figure 6. The fitting process of our mixture of kernels representation 𝑐 (𝑥).
Intuitively, this representation projects the set of sparse and noisy estimates of \( u(x_i) \), i.e., the vector \( \langle u \rangle \), onto the subspace spanned by the harmonic kernels \( \{\phi_j(x)\} \). Since this fit is based only on a limited set of points and kernels and does not enforce correctness elsewhere, this representation is a biased estimator. Therefore, while this representation \( c(x) \) serves as a useful denoiser due to the smoothness of the kernels, it is not a sufficiently accurate approximation of the true solution \( u(x) \), see Figure 5. In the next section, we show how it can be leveraged as a control variate to reduce variance and obtain an unbiased estimate of \( u(x) \) throughout \( \Omega \).
3.2 Two-Pass with Control Variates
The main advantage of our representation \( c(x) \) being harmonic is that it satisfies \( \Delta c(x) = 0 \) in \( \Omega \). This allows us to define a residual PDE:
which corresponds to solving for the harmonic residual function \( r(x) := u(x) – c(x)\). Since the Dirichlet boundary condition for \( r(x) \) is known, namely \( r(x) = g(x) – c(x) \) on \( \partial \Omega \), we can apply the Walk-on-Spheres (WoS) algorithm to estimate \( \langle r(x) \rangle \). Our control variate estimator for \( u(x) \) is then given by
The argument for why this estimator is still unbiased and has smaller variance is similar to the 1D case outlined in Section 2.2. This estimator is unbiased since
where we used the fact that \( \mathbb{E}[\langle r(x) \rangle] = r(x) \) due to the unbiasedness of WoS. Therefore, as the number of random walks increases, the control variate estimator \( \langle u(x) \rangle^{(\text{cv})} \) converges to the true solution \( u(x) \). For variance reduction, observe that
As discussed earlier in the end of Section 2.1, variance is reduced when \( c(x) \) provides a good approximation to \( u(x) \) on the boundary. In this case, the residual boundary condition \( g(x) – c(x) \) has a smaller range, resulting in reduced variance in the WoS estimates for \( r(x) \) and, by extension, for \( u(x) \).
3.3 Implementation Details
We next describe several implementation choices in setting up and solving the linear system, as well as in implementing control variates.
Linear System Setup. Our formulation yields an \( N \times M \) linear system, where \( N \) is the number of constraints (sampled interior points) and \( M \) is the number of kernels. We deliberately set \( N > M \), making the system over-constrained. This avoids ill-posedness and eliminates the possibility of infinitely many solutions. To further enhance stability, we adopt Kd-tree based adaptive sampling, which distributes interior points evenly across the domain rather than allowing clusters in small regions. This reduces the risk of nearly colinear rows in the system matrix, which would otherwise harm numerical conditioning. Finally, we add regularization to eq. (***) to improve robustness near the boundary. Without it, the fitted representation tends to produce large coefficient magnitudes, leading to spikes and non-smooth artifacts. We experimented with both \( \ell_2 \) (ridge) regularization, which discourages large weights, and \( \ell_1 \) (lasso) regularization, which can additionally drive coefficients to zero. Both were effective, but we ultimately used \( \ell_2 \) in our experiments.
Control Variates with Zombie [Sawhney and Miller 2023].We build on the Zombie library, which provides efficient C++ implementations of Monte Carlo geometry processing methods along with Python bindings. In the baseline setting, Zombie takes as input the geometry together with boundary conditions sampled on a grid. During Walk-on-Spheres (WoS), Zombie interpolates these grid values whenever a trajectory hits the boundary, and then outputs estimated interior values on a user-specified grid resolution. Our variance reduction technique using control variates requires only a minimal change to this setup. Instead of supplying the original boundary condition, we provide the boundary condition of the residual function. Zombie then performs WoS to estimate the residual inside the domain. The final solution is obtained by simply adding back the evaluation of \( c(x) \) on the grid to these residual estimates. This simple modification makes the method straightforward to implement while still delivering effective variance reduction.
4 Experiments
We explore two types of harmonic kernels for our harmonic representation. The first is the 2D fundamental solution (Green’s function): for \( x,y\in\mathbb{R}^2 \),
which satisfies \( \Delta_x G^{\mathbb{R}^2}(x,y)=\delta(x-y)\) and thus harmonic in \(x\) away from the singularity at \(y\); it underlies methods such as the Mixture of Fundamental Solutions and boundary-element approaches.
The second is the angle kernel obtained by integrating an angular density along a directed curve. For 2D polylines, the kernel admits a closed form: writing the (i)-th directed segment as \( \ell_i=[a_i,b_i] \), the associated angle kernel is defined as
with \( \det(u,v)=u_1v_2-u_2v_1 \). The \( \operatorname{atan2} \) formulation is numerically robust and returns the signed angle in \((-\pi,\pi]\). One can verify that the kernel is harmonic for \(x\notin\ell_i\); across the segment the angle exhibits \(\pm 1\) jump discontinuities. This angle representation is inspired by generalized winding-number and boundary-integral formulations (see Jacobson, Kavan, and Sorkine [2013])
Figure 7. The angle kernel associated with a line segment
To verify our 2-pass method, we compared it against a reference solution obtained by running WoS with 32,768 walks, which we take as ground truth since this number is sufficient for convergence. For evaluation, we tested using between 4 and 1024 walks, and compared three methods—naive WoS, control variates with the Green’s function, and control variates with the angle kernel. We measured the median RMSE across 20 unique domains relative to the reference solution.
For all experiments, we used the following settings. The grid resolution was \( 256 \times 256 \) and the epsilon shell for WoS was \( 0.001 \). The number of basis functions for the denoiser was \( 80 \) and the number of walks for the samples for the denoisers was \( 128 \). We used \( \ell_2 \) regularization with \( \alpha = 0.1 \). The Dirichlet boundary condition we used is given by:
Figure 8a. Comparing Green’s kernel vs. Angle kernel against naive WoS, on a single domain with varying number of walks.
Figure 8b. Comparing our method vs. naive WoS for a variety of domains. We used Green’s kernel and 256 walks.
Based on RMSE, we observed that our control variate method for both kernels outperformed the naive WoS on average. The control variate with Green’s theorem showed significant linear improvement as the number of walks increased. The angle kernel did not yield as good results as Green’s; however, in the vast majority of experiments, the angle kernel control variate still achieved better RMSE than naive WoS. The only exception we observed was on more complex domains with a high number of walks. In contrast, for the Green’s kernel, we did not encounter any domain or walk count in which naive WoS outperformed the control variate method.
For computation time, we observed that the denoising process has a fixed amount of overhead. Therefore, as the number of walks increases, the denoising time becomes a negligible factor in the overall computation time. The bottleneck is then essentially the same as the naive method: the random walks.
Figure 9. RMSE comparing different kernels, varying the number of walks
Figure 10. Runtime comparison between naive WoS and our harmonic control variate method
5 Extensions
5. 1 N-Pass Gradient Descent Approach
Building upon the 2-pass harmonic control variate method, we experimented with an n-pass gradient descent approach to iteratively refine the control variate using Monte Carlo estimates as supervision. This method was motivated by the Neural Walk-on-Spheres framework [Nam, Berner, and Anandkumar 2024], which uses a general-purpose neural network \( u_\theta(x) \) as the learned control variate. Our method instead optimizes the coefficient weights \( \sigma ∈ ℝ^{M} \) of a harmonic control variate \( c(x) \), adhering to the structure of the underlying PDE.
The weights \( \sigma \) are initialized from a small set of sample points via the linear system (**), and stochastic gradient descent aims to then minimize the discrepancy between the control variate \( c(x) \) and Monte Carlo estimates \( \langle u(x) \rangle \).
Our optimization loop is implemented as follows:
Sample a batch of interior points \( x_i \) and compute Monte Carlo estimates \( \langle u(x_i) \rangle \) for each batch point using WoS
Evaluate the current control variate at each \( x_i \), and compute the mean square error (MSE) loss $$\mathscr{L} = \frac{1}{B} \sum_{i=1}^{B} \left( c(x_i) – \langle u(x_i) \rangle \right)^2$$
Perform a gradient step on \( \mathscr{L} \) with respect to \( \sigma \) via Adam optimizer
Figure 11. Naive WoS with 64 walks, 65536 points vs. n-pass with 32 pts/batch, 512 learning steps, 128 walks per sample pt., 32 walks in residual, Green’s function, learning-rate=0.01
For the same total number of walks, Figure 11 illustrates a significant improvement in RMSE compared to naive WoS, with the contours indicating a much less noisy interpolation of the boundary for our n-pass solution.
Further improvements include accelerating the repeated computation of \( c(x) \), adaptively adjusting the learning rate, and experimenting with different weight initializations. For example, when artifacts appeared in the denoiser due to a large number of monopoles, these artifacts remained present even after the optimization, since weights were initialized based on the denoiser.
5.2 Neumann Boundary Conditions
Many real-world PDE problems involve boundary behavior that is best described by flux, rather than a set of fixed values. Neumann conditions prescribe the normal derivative of the solution along the boundary. In the case of mixed boundary conditions, we seek a function \( u \) satisfying:
$$ \Delta u = 0 \text{ in } \Omega, \quad u = g \text{ on } \partial\Omega_D, \quad \frac{\partial u}{\partial n} = h \text{ on } \partial\Omega_N $$
where the boundary \( \partial \Omega = \partial \Omega_D \cup \partial\Omega_N \) is partitioned into Dirichlet and Neumann components. A 2-pass approach in this setting would first construct a control variate \( c(x) \approx u(x) \), and estimate the residual \( u(x) – c(x) \) which satisfies:
$$\Delta(u – c) = 0 \text{ in } \Omega, \quad (u – c) = (g – c) \text{ on } \partial\Omega_D, \quad \frac{\partial}{\partial n}(u – c) = h – \frac{\partial c}{\partial n} \text{ on } \partial\Omega_N$$
In particular, when we choose Green’s function for our control variate, the Neumann boundary condition is corrected via the Poisson kernel,
By using this residual-based formulation, we apply the control variate to both value and flux constraints, enabling variance reduction even in domains with mixed boundaries.
Figure 12. Naive vs. 2-pass method with Dirichlet BC 𝑢 (𝑥, 𝑦) = cos(2𝜋𝑦) and Neumann BC 𝜕𝑢/𝜕𝑛 = 0, num of walks = 128
To handle Neumann boundary conditions, the original WoS algorithm needs to be modified to address the behavior of walks near Neumann boundaries, the radius of closed balls around each point, and conditions for walk termination. The Walk on Stars algorithm (Sawhney et al., 2023) does exactly this. Walk on Stars generalizes WoS by replacing ball-shaped regions with stars — a star being any shape where every boundary point can be “seen” from the centre. At Neumann boundaries, Walk on Stars simulates reflecting Brownian motion — when a walk reaches a Neumann boundary, it is reflected according to the boundary’s normal vector and picks up a contribution of the Neumann condition, \( h \).
Figure 12 illustrates our results with a zero-valued Neumann boundary condition.The 2-pass method still reduces the RMSE compared to Naive WoS with the same number of walks. However, the contours indicate somewhat increased noise near the reflecting boundary compared to the absorbing boundary on our solution.
5.3 Grouping Techniques to Reduce Overfitting and Improve Stability
Figure 13. Density-based grouping, with improved RMSE for Angle Kernel (right)
As detailed in Section 3.3, a major consideration when estimating control variates is the conditioning and placement monopoles. To further address this, we experimented with grouping techniques that compress nearby or similar samples and assign them a shared influence, namely:
Greedy grouping: Choose a point and assign all other points within a fixed radius to one group. Each groups shares either a weight or a single basis function, effectively reducing the number of active degrees of freedom in the system. Our experiments show a reduction in RMSE via grouping for the angle kernel (see Figure 13), but contours indicate a slight increase in noise in some regions.
DBSCAN: This density-based clustering technique automatically identifies dense clusters of points. Each cluster is treated as a group, and only a representative or averaged contribution is retained, while the rest are collapsed into it.
6 Conclusions and Future Directions
We introduced harmonic control variates: a simple, closed-form control variate for Walk-on-Spheres built from mixtures of harmonic kernels. By fitting a lightweight least-squares model once and efficiently evaluating it in closed form during sampling, we turn WoS into a two-pass estimator that remains unbiased while substantially lowering variance. Empirically, our method delivers consistent accuracy gains across diverse geometries—e.g., an average 40% RMSE reduction at 128 walks—while adding only minimal overhead compared to previous neural control variate approaches; as walks dominate runtime, the extra cost is effectively negligible. The result is a method that is easy to implement (a small change to a standard WoS pipeline), robust to geometric complexity, and markedly more sample-efficient.
Future work. We plan to release a plug-in that drops into existing WoS codebases and to continue improving our method for even stronger variance reduction at essentially the same runtime.
References
Jacobson, Alec, Ladislav Kavan, and Olga Sorkine [2013]. “Robust Inside-Outside Segmentation using Generalized Winding Numbers”. In: ACM Trans. Graph. 32.4.
Li, Zilu et al. [2024]. “Neural Control Variates with Automatic Integration”. In: ACM SIGGRAPH 2024 Conference Papers. SIGGRAPH ’24. Denver, CO, USA: Association for Computing Machinery. isbn: 9798400705250. doi: 10.1145/3641519. 3657395. url: https://doi.org/10.1145/3641519.3657395.
Madan, Abhishek et al. [July 2025]. “Stochastic Barnes-Hut Approximation for Fast Summation on the GPU”. In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers. SIGGRAPH Conference Papers ’25. ACM, pp. 1–10. doi: 10.1145/3721238.3730725. url: http://dx.doi.org/10.1145/3721238.3730725.
Muller, Mervin E. [1956]. “Some Continuous Monte Carlo Methods for the Dirichlet Problem”. In: The Annals of Mathematical Statistics 27.3, pp. 569–589. doi: 10.1214/aoms/1177728169. Url: https://doi.org/10.1214/aoms/1177728169.
Nam, Hong Chul, Julius Berner, and Anima Anandkumar [2024]. “Solving Poisson Equations Using Neural Walk-on-Spheres”. In: Forty-first International Conference on Machine Learning
Sawhney, Rohan and Keenan Crane [Aug. 2020]. “Monte Carlo geometry processing: a grid-free approach to PDE-based methods on volumetric domains”. In: ACM Trans. Graph. 39.4. issn: 0730-0301. Doi: 10.1145/3386569.3392374. url: https://doi.org/10.1145/3386569.3392374.
Sawhney, Rohan and Bailey Miller [2023]. Zombie: Grid-Free Monte Carlo Solvers for Partial Differential Equations. Version 1.0.
Sawhney, Rohan, Bailey Miller, et al. [July 2023]. “Walk on Stars: A Grid-Free Monte Carlo Method for PDEs with Neumann Boundary Conditions”. In: ACM Trans. Graph. 42.4. issn: 0730-0301. doi: 10.1145/3592398. url: https://doi.org/10.1145/3592398.