Beyond Spherical Harmonics: Rethinking Appearance Models for Radiance Reconstruction

Source: arXiv:2606.09794 · Published 2026-06-08 · By Ewa Miazga, Jorge Condor, Piotr Didyk

TL;DR

This paper addresses the challenge of modeling view-dependent appearance in novel-view synthesis and radiance reconstruction, where capturing complex angular effects such as specularities typically requires high-frequency spherical representations that are computationally expensive and memory-intensive. The common choice, spherical harmonics (SH), is inherently limited in frequency, forcing practitioners to use low-order SH that produce overly smooth angular detail and fail on sharp highlights. The authors systematically evaluate a broad spectrum of parametric spherical functions—some introduced to graphics for the first time—and identify key properties influencing their effectiveness: multi-modality, closed-form integration, and anisotropy. Based on these insights, they propose a novel spherical function, the Normalized Anisotropic Spherical Gabor (NASGabor) kernel, which combines a normalized anisotropic Gaussian envelope with a harmonic carrier to model complex multi-lobe and anisotropic view-dependent effects efficiently and compactly. Experimental results on several datasets including Mip-NeRF360 demonstrate that NASGabor achieves higher reconstruction quality (PSNR gains up to ~0.5 dB), better modeling of highlights and glints, and is up to five times more memory-efficient than standard low-order SH. The method trains and renders faster while maintaining analytic integrals and gradients for stable optimization.

Key findings

NASGabor achieves up to 5× reduction in memory usage compared to 3rd-degree spherical harmonics (48 params per primitive vs 12-39 params) while improving PSNR by ~0.5dB on Mip-NeRF360 (30.37 vs 27.22).
In indoor scenes with dense views, anisotropic spherical functions like NASGabor improve reconstruction quality consistently over isotropic or low-order spherical harmonics, as shown by PSNR and perceptual metrics.
Multiple lobes and multi-modal spherical distributions can better approximate complex angular effects, but adding more than 2 lobes yields diminishing returns due to optimization difficulties (Figure 4).
Normalization of spherical functions (enforcing unit integral) stabilizes optimization and improves final accuracy across datasets (Figure 5).
Closed-form integral expressions and analytic derivatives for the proposed NASGabor enable efficient training and rendering, with training times competitive to baselines (14m42s vs 20m20s for exact vs approximate integral).
SH’s low-frequency smoothness supports robustness in sparse viewpoint scenarios (outdoor scenes), but NASGabor matches quality there while excelling when angular coverage is sufficient.
The proposed learning rate schedule based on average nearest-neighbor camera distance mitigates overfitting of more expressive spherical functions on sparse data.
Qualitative results (Figure 6) show sharper and more accurate specular highlights and view-dependent effects reconstructed by NASGabor compared to SH and other baselines.

Methodology — deep read

The authors focus on primitive-based radiance fields, where scenes are represented as collections of 3D primitives (typically anisotropic Gaussians) each encoding geometry and view-dependent appearance. The key is modeling the directional radiance ci(d) per primitive over viewing direction d on the unit sphere.

They extend the common formulation that uses spherical harmonics (SH) coefficients to represent ci(d) as linear combinations of fixed basis functions. SH basis functions have fixed orthogonal basis Ylm(d) and learned coefficients clm, but are limited in frequency and thus smooth angular detail. To overcome SH limitations, the authors systematically study a broad set of parametric spherical functions including isotropic kernels (Spherical Gaussian, Spherical Cauchy, Spherical Beta), ring-like/bimodal functions (Spherical Logistic, Fisher-Bingham families), and anisotropic functions (Anisotropic Spherical Gaussian (ASG), Normalized ASG (NASG), Linearly Transformed Cosines (LTC)). Some of these functions have closed-form integral expressions and analytic derivatives, enabling efficient rendering and optimization.

They implement a unified appearance model with learned weighted sums of multiple lobes of these functions plus a diffuse RGB base color, enforcing normalization on the spherical functions to improve stability. The number of lobes L is varied to balance expressiveness and model complexity.

For the novel NASGabor function, they compose a normalized anisotropic spherical Gaussian envelope with a harmonic cosine carrier modulated by frequency parameter k, enabling anisotropic, multi-modal lobes with ripple detail. They derive closed-form integral formulae and analytic gradients for backpropagation.

Experiments are conducted primarily on MipNeRF360 (high-res indoor/outdoor scenes with varying camera densities), plus Tanks and Temples and Deep Blending datasets. They use the gsplat framework with Beta Splatting opacity kernels to represent primitives, fixing the number of primitives (~1 million) and training iterations across models for fairness. Learning rates for spherical function parameters are automatically set based on average nearest-neighbor camera distance (dknn) to prevent overfitting on sparse views.

Evaluation metrics include PSNR, SSIM, and LPIPS. Baselines include SH-based 2DGS and 3DGS MCMC, Beta Splatting with SH or Spherical Beta, Spherical Voronoi, and recent state-of-the-art methods. Ablation studies analyze the impact of normalization, number of lobes, and integral approximations.

Qualitative examples showcase improved rendering of specular highlights and view-dependent effects visually. Timing benchmarks demonstrate competitive training and rendering speeds on NVIDIA Grace Hopper GPUs. Some prior methods lacked analytic gradients and required hybrid renderers, while NASGabor maintains a fully differentiable CUDA implementation.

The paper provides extensive analytic details on the spherical functions, parameterizations, integral formulae, and optimization details in supplementary materials. The NASGabor kernel parameters control spread, anisotropy/sharpness, and frequency of ripples, with an orthonormal frame to orient lobes.

Overall, the methodology combines rigorous theoretical exploration with practical implementation and experimental validation for radiance reconstruction via efficient, expressive spherical directional models.

Technical innovations

Introduction of the Normalized Anisotropic Spherical Gabor (NASGabor) function, combining a normalized anisotropic Gaussian envelope with a harmonic cosine carrier to model multi-modal, anisotropic view-dependent appearance with closed-form integration and analytic gradients.
Systematic evaluation of a broad variety of spherical function families (including less common ones like spherical Cauchy, Fisher-Bingham distributions) for radiance field directional modeling, with insights on multi-modality, normalization, and anisotropy.
A learning rate heuristic based on average k-nearest neighbor camera distances to stabilize training of high-capacity spherical models on datasets with varying angular sampling density.
Demonstration that normalized spherical functions improve optimization stability and final reconstruction quality by conditioning gradients and preventing numerical issues, extending normalization concepts from deep learning to spherical modeling.
Applying multi-lobe spherical functions with regularization (e.g., Spherical Wasserstein Distance) to encourage diversity and prevent lobe collapse, though with mixed impact on final quality.

Datasets

Mip-NeRF360 — diverse indoor/outdoor scenes — public benchmark
Tanks and Temples — multi-view 3D reconstruction benchmark — public
Deep Blending — novel view synthesis dataset — public

Baselines vs proposed

2DGS with SH (48 params): PSNR on Mip-NeRF360 = 27.22 vs NASGabor 4 lobes (39 params): 28.46 (+1.24 dB)
Beta Splatting with SH (48 params): PSNR = 28.00 vs NASGabor 2 lobes (21 params): 28.46 (+0.46 dB)
Spherical Voronoi (SV) with 12 params: PSNR = 28.19 vs NASGabor 1 lobe (12 params): 28.40 (+0.21 dB)
Spec-Gaussians (878 MB storage): PSNR = 29.00 vs NASGabor 1 lobe (320 MB): 30.14 (+1.14 dB)
Glossy-Gaussian (888 MB): PSNR = 28.64 vs NASGabor 1 lobe (320 MB): 30.14 (+1.5 dB)

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2606.09794.

Fig 1

Fig 1: We introduce a new spherical function, the Normalized Anisotropic Spherical Gabor (NASGabor), an anisotropic, multi-modal

Fig 2

Fig 2 (page 1).

Fig 3

Fig 3 (page 1).

Fig 4

Fig 4 (page 1).

Fig 5

Fig 5 (page 1).

Fig 6

Fig 6 (page 1).

Fig 7

Fig 7 (page 1).

Fig 8

Fig 8 (page 1).

Limitations

NASGabor’s closed-form integral is computationally more expensive than simpler kernels, though an approximation reduces this with minor quality loss.
Multi-lobe formulations have diminishing returns beyond 2 lobes, with complex optimization landscapes causing lobe collapse and instability despite regularization.
The method’s improvements are more prominent in settings with dense and uniform camera coverage; robustness in extremely sparse or highly dynamic scenes remains less explored.
Comparison under strong distribution shifts or adversarial view perturbations is not provided, potentially limiting conclusions about robustness to challenging conditions.
The detected gains (~0.2-0.5 dB PSNR over SH) are modest, reflecting limits of directional modeling improvements relative to overall radiance reconstruction complexity.
Some spherical functions evaluated suffer from expensive normalization computations, limiting their pragmatic use despite theoretical advantages.

Open questions / follow-ons

Can the proposed NASGabor model be extended or adapted to dynamic scenes or temporally varying view-dependent effects to improve robustness?
How do these advanced spherical functions perform under sparse or non-uniform camera distributions beyond nearest neighbor learning rate heuristics?
Can optimization strategies or regularizers be developed to better leverage multiple lobes without collapse, enabling more complex multi-modal appearance modeling?
What are the trade-offs between computation, memory, and reconstruction quality for very high-frequency effects beyond those tested, especially in large-scale outdoor datasets?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners dealing with view synthesis or rendering tasks, this paper provides insights into effective representation of directional appearance, which could be relevant if novel-view synthesis or radiance approximation is used for generating challenge images or testing visual perception. The introduction of anisotropic, multi-modal spherical functions that efficiently encode high-frequency view-dependent effects could inspire more compact and expressive appearance models for synthetic data generation. Additionally, the normalized function framework balancing compactness and optimization stability may be applicable to related rendering or image synthesis pipelines involved in bot-detection mechanisms. However, since the paper focuses on 3D radiance reconstruction rather than adversarial modeling or security-oriented image generation, direct application to CAPTCHA engineering is limited but could inform background rendering components in systems that rely on photorealistic challenge image generation.

Cite

bibtex

@article{arxiv2606_09794,
  title={ Beyond Spherical Harmonics: Rethinking Appearance Models for Radiance Reconstruction },
  author={ Ewa Miazga and Jorge Condor and Piotr Didyk },
  journal={arXiv preprint arXiv:2606.09794},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.09794}
}

Beyond Spherical Harmonics: Rethinking Appearance Models for Radiance Reconstruction ​

TL;DR ​

Key findings ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​