Spectral Handling and Estimation of AGN Parameters (SHEAP), The first AGN fitting GPU-based code
Source: arXiv:2606.03934 · Published 2026-06-02 · By F. Ávila-Vera, P. Sánchez-Sáez, V. Motta, S. Bernal
TL;DR
This paper introduces SHEAP (Spectral Handling and Estimation of AGN Parameters), the first GPU-accelerated spectral fitting framework for active galactic nuclei (AGN) spectra. Anticipating a large influx of AGN spectra from upcoming surveys, the authors designed SHEAP to efficiently process large, heterogeneous datasets while maintaining physically interpretable models and robust uncertainty estimation. Building on the JAX Python framework, SHEAP employs modular spectral components—including host galaxy, Fe II pseudo-continuum, Balmer continuum, and multiple emission lines—combined with gradient-based optimization leveraging automatic differentiation and just-in-time compilation. This approach enables stable and fast fitting of complex blended spectral regions like Hβ, while supporting batch processing on GPUs for massive scale. The authors validate the method on roughly 2000 AGN spectra across multiple spectral regions (C IV, Mg II, Hβ, and Hα), finding strong agreement (85–100% of objects within ±0.3 dex) with literature results and public software. Importantly, SHEAP's fitting runtime is approximately 1.7% that of classical CPU implementations (notably pPXF), yielding roughly 100x computational speedup. Overall, SHEAP offers a scalable, flexible, and reproducible tool suitable for the challenges posed by next-generation spectroscopic AGN surveys.
Key findings
- SHEAP achieves 85–100% of AGN spectral parameter measurements within ±0.3 dex of literature results across multiple independent samples and spectral regions (C IV, Mg II, Hβ, Hα).
- The reduced chi-square values of SHEAP fits cluster near unity, indicating statistically acceptable fits.
- Compared to pPXF CPU-based spectral fitting, SHEAP reduces computational time to ~1.7%, equivalent to ~100× speed improvement on GPU hardware.
- SHEAP processing includes modeling of host galaxy starlight using single stellar population templates convolved with Gaussian LOSVD, Fe II emission via template convolution, Balmer continuum with empirical templates, and multi-component emission lines with physically motivated parameter bounds.
- The optimization uses Adam gradient-based method with log-cosh loss for robust outlier resistance and automatic differentiation for efficient gradient computation.
- Uncertainties are estimated robustly through Monte Carlo perturbation resampling with 50 realizations per spectrum, capturing parameter covariances and nonlinearities.
- Batch processing on GPU is implemented via JAX array transformations and just-in-time compilation, enabling simultaneous fitting of multiple spectra with consistent model assumptions.
- Post-fit derived quantities including line fluxes, equivalent widths, continuum luminosities, and single-epoch SMBH mass estimates incorporate propagated uncertainties from the fitting process.
Threat model
n/a — This work addresses astrophysical spectral modeling challenges rather than adversarial security threats. The 'adversary' is effectively observational noise, blending, and model degeneracy rather than a malicious attacker.
Methodology — deep read
The methodology centers on modeling AGN optical spectra as a sum of modular physically motivated components: host-galaxy starlight, Fe II pseudo-continuum, AGN featureless continuum, Balmer continuum, and multi-component emission lines.
Threat Model & Assumptions: The method treats the spectral fitting challenge as a nonlinear optimization over observed spectra affected by blends, host contamination, and noise; no explicit adversarial context is included since this is an astrophysical modeling scenario rather than a security setting.
Data: SHEAP processes spectra stored as three-dimensional arrays representing batches of spectra with wavelength, flux density, and per-pixel uncertainty channels. Inputs come from public surveys including SDSS DR16 and other literature datasets totaling ~2000 AGN spectra across z~0.1–2.3, covering rest-frame 1100–7000 Å. Spectra are corrected for Galactic extinction and cosmological redshift, then resampled onto common wavelength grids preserving flux and uncertainties.
Modeling Architecture:
- Host galaxy starlight is modeled as a linear combination of single stellar population (SSP) templates from the E-MILES library, spanning ages 0.1–10 Gyr and metallicities −2.32 to 0.22 dex. Templates are convolved in Fourier space with a Gaussian line-of-sight velocity distribution characterized by mean velocity V and dispersion σ.
- AGN continuum is flexible: linear baseline, single power-law, or broken power-law options.
- Balmer continuum emission modeled as a sum of parametric Balmer continuum and high-order Balmer pseudo-continuum using an empirical template (Bernal+2025).
- Fe II emission modeled either using empirical templates (Boroson & Green 1992, Vestergaard & Wilkes 2001) scaled, broadened, and velocity-shifted; or alternatively the atomic-data based FANTASY method for more complex Fe II modeling.
- Emission lines modeled using multiple Gaussian components per line, with physically motivated parameter bounds and ties (e.g., fixed flux or width ratios for narrow [O III]).
- Optimization:
- Fit procedure uses the Adam optimizer implemented via Optax in JAX, employing first-order gradients via automatic differentiation.
- The loss is a robust log-cosh loss on normalized residuals (difference between data and model scaled by uncertainties) to reduce sensitivity to outliers and artifacts.
- Parameter bounds are enforced by smooth reparameterizations, maintaining differentiability and optimizer stability.
- Batch optimization is performed simultaneously for all spectra in a given batch, leveraging JAX's vectorized operations and GPU acceleration.
- Standard fitting regime involved 2000 iterations with learning rate 0.01.
- Uncertainty Estimation:
- Uncertainties on best-fit parameters and derived quantities are estimated via Monte Carlo perturbation resampling, generating 50 synthetic perturbed spectra per object, each refit with the same initial conditions.
- The distribution of refit parameters captures nonlinear covariances and uncertainty propagation.
- Evaluation Protocol:
- The authors validate SHEAP by comparing parameter measurements (line widths, fluxes, continuum luminosities) to multiple independent published measurements on AGN spectral samples from SDSS DR16 (Wu & Shen 2022), Pan+2025, Bernal+2025, and Sánchez-Sáez+2018.
- Agreement quantified by fraction of objects with log parameter differences ≤0.3 dex, median offset, and normalized median absolute deviation (NMAD).
- Reduced chi-square statistics are evaluated as fit quality metrics.
- Performance (runtime) comparisons made against pPXF CPU-based spectral fitting.
- Reproducibility:
- SHEAP is publicly documented with code and examples at https://sheap.readthedocs.io/ . Datasets are publicly available SDSS spectra with an interface to standard spectral templates. Exact code state for the paper not explicitly stated but the method uses fully open software libraries (JAX, Optax).
Example end-to-end: A batch of 500 SDSS DR16 quasar spectra covering the C IV emission region is preprocessed for redshift and extinction correction, resampled to a uniform wavelength grid, then modeled with a power-law continuum plus 3 Gaussian C IV components, plus host galaxy and Fe II templates. The Adam optimizer runs 2000 iterations minimizing the log-cosh loss between model and data fluxes, simultaneously fitting the full batch in a single compiled JAX call on an NVIDIA RTX A5000 GPU. The best-fit parameters are extracted, then 50 Monte Carlo perturbations of each spectrum are refit to estimate parameter uncertainties and covariances. Validation shows 90% of C IV line widths within ±0.3 dex of prior literature. Runtime is roughly 1.7% of the CPU-based baseline.
Technical innovations
- First GPU-accelerated AGN spectral-fitting code leveraging JAX for automatic differentiation and just-in-time compilation enabling ~100× speedup over CPU-based methods (e.g., pPXF).
- Modular, physically motivated spectral model combining host galaxy SSP templates convolved with Gaussian LOSVD, multi-component emission lines, Fe II pseudo-continuum templates, and flexible AGN continuum models under a unified differentiable framework.
- Gradient-based Adam optimizer with smooth bound-handling and robust log-cosh loss applied simultaneously over large batches of spectra utilizing vectorized JAX operations for stable, scalable fitting.
- Monte Carlo perturbation resampling integrated in GPU-accelerated pipeline for robust uncertainty estimation capturing parameter nonlinearities and correlations.
Datasets
- SDSS DR16 quasars (Wu & Shen 2022 subset) — ~500 spectra — public SDSS data
- Pan+2025 sample — 500 spectra — literature sample
- Bernal+2025 sample — 413 spectra — literature sample
- Sánchez-Sáez+2018 sample — 151 spectra — literature sample
Baselines vs proposed
- pPXF CPU fitting: runtime = 100% vs SHEAP: runtime = 1.7% (~100× faster)
- SHEAP vs Wu & Shen 2022: ~90% of C IV parameters within ±0.3 dex
- SHEAP vs Bernal+25: 85–95% agreement for Hβ and Hα parameters within ±0.3 dex
- Reduced chi-square mean ~1.0 across all compared samples vs literature fits
Limitations
- Current host galaxy modeling uses only Gaussian LOSVD without higher-order Gauss–Hermite moments, limiting stellar kinematic detail.
- Fe II modeling primarily based on empirical templates may lack flexibility in complex spectral regions; atomic-data based FANTASY approach included but adds computational complexity.
- No explicit adversarial or outlier contamination testing beyond robust loss; heavily blended or low S/N spectra may challenge fits.
- Validation datasets do not include extremely low S/N or unusual AGN spectral types beyond studied samples.
- Uncertainty quantification via MC perturbations assumes Gaussian noise statistics; systematic uncertainties from model assumptions not fully explored.
- Code reproducibility and exact runtime comparisons depend on hardware; real survey-scale performance remains to be demonstrated beyond ~2000 spectra tests.
Open questions / follow-ons
- How does SHEAP handle extremely low signal-to-noise or highly blended multi-component AGN spectra in next-generation survey data?
- Can the inclusion of non-Gaussian LOSVD moments improve host galaxy subtraction without compromising computation speed?
- What are the limits of the empirical Fe II templates, and how can atomic-data-based models be efficiently incorporated for broader wavelength ranges?
- How will the method scale and integrate with real-time survey workflows processing millions of spectra continuously?
Why it matters for bot defense
While this paper does not directly address bot defense or CAPTCHA, its core contribution—high-throughput, scalable, GPU-accelerated spectral fitting with robust uncertainty estimation—demonstrates a general paradigm valuable for processing massive heterogeneous data efficiently. Bot defense practitioners can draw parallels to SHEAP’s use of automatic differentiation, vectorized batch processing, and modular, interpretable modeling with uncertainty quantification to build scalable real-time systems that must operate on noisy, complex signals under constrained computation budgets. Similarly, the robust loss function and MC perturbation resampling offer strategies for resilient inference despite noisy inputs or outliers, applicable in bot detection pipelines. The emphasis on physical interpretability combined with computational efficiency provides a useful blueprint for designing bot-defense methods that require explainability alongside scalability. Overall, the paper offers valuable insights into GPU-accelerated gradient-based modeling frameworks and uncertainty quantification techniques that can inspire approaches in CAPTCHA reliability and bot detection systems facing large-scale noisy data.
Cite
@article{arxiv2606_03934,
title={ Spectral Handling and Estimation of AGN Parameters (SHEAP), The first AGN fitting GPU-based code },
author={ F. Ávila-Vera and P. Sánchez-Sáez and V. Motta and S. Bernal },
journal={arXiv preprint arXiv:2606.03934},
year={ 2026 },
url={https://arxiv.org/abs/2606.03934}
}