Adaptive Derivative Estimation via Stein's Unbiased Risk

Source: arXiv:2606.09829 · Published 2026-06-08 · By Yonathan Murin, Ali Ozer Ercan

TL;DR

This work addresses the classical challenge of causal derivative estimation from noisy discrete-time data, fundamental in control, HCI, and biomedical domains. The key difficulty is balancing the tradeoff between noise amplification (high variance) for short smoothing windows and signal distortion bias for long smoothing windows when using causal FIR derivative filters. The authors propose SURDE (SURE Derivative Estimator), a novel adaptive method leveraging Stein’s Unbiased Risk Estimator (SURE) to select or aggregate filter lengths at each time step in a data-driven manner. SURDE evaluates an unbiased MSE estimate for various candidate window lengths and combines their outputs softly using an exponential weighting scheme with an optimal temperature derived in closed form. This removes the heuristic tuning efforts required by prior methods.

Theoretical analysis proves a minimax-optimal oracle inequality bounding the risk of the adaptive estimator within O(σ²√log K) of the oracle that knows the best filter length beforehand. Extensive experiments on synthetic multi-frequency signals and real-world robotic trajectory data from the EuRoC MAV dataset demonstrate that SURDE consistently outperforms established adaptive estimators such as the Intersection of Confidence Intervals (ICI) rule and Adaptive Windowing Velocity Estimator (AWVE). SURDE’s soft-combining variant yields smoother estimates with lower MSE across noise regimes. It also exhibits strong robustness to noise variance misspecification, degrading by only around 9% over a 4x noise range, compared to much larger drops for competing methods. Computational complexity is reduced, enabling practical real-time use. This work advances adaptive derivative estimation by replacing heuristic adaptation with an unbiased risk-driven framework with rigorous guarantees.

Key findings

SURDE’s soft-combining estimator achieves an oracle inequality with excess risk bounded by O(σ²√log K), where K is the number of candidate window lengths.
In Monte Carlo simulations, SURDE yields a 4× lower MSE than AWVE and 8% lower MSE than ICI in medium noise settings (σ = 0.05).
SURDE degrades gracefully with noise variance misspecification, showing only 9% performance loss over a 4× noise variance range, versus 36% for ICI and 222% for AWVE.
In real-world EuRoC MAV trajectory data experiments, SURDE outperforms ICI and AWVE, achieving the lowest estimation error among adaptive methods without requiring retuning.
SURDE reduces the per-candidate filter length computational cost from O(N²) (AWVE) to O(N), representing a 23× reduction when tested with window lengths up to 80.
The optimal temperature for the soft-combining weights can be computed in closed form, leaving noise variance as the sole tuning parameter.
SURDE’s bias-variance tradeoff adaptation dynamically tracks the local signal-to-noise ratio better than fixed-length or heuristic methods.
Experimental results confirm SURDE’s adaptation aggressively favors shorter windows during fast signal changes and longer windows in slow segments, achieving near-optimal bias and variance per segment.

Threat model

Not applicable; the work addresses noise-corrupted derivative estimation rather than adversarial attacks or security threats. The noise is modeled as Gaussian with known statistics, and no active attackers or manipulations are considered.

Methodology — deep read

Threat Model & Assumptions: The goal is to estimate the n-th order derivative of a deterministic continuous-time signal observed through noisy, discrete-time samples. The noise is modeled as zero-mean Gaussian with known variance and covariance structure, possibly colored but stationary. The estimation must operate causally, using only current and past samples. The adversary here is noise corruption; no active attacker model is considered.
Data: Synthetic signals comprise concatenated sinusoidal segments with periods 15, 40, and 100 to span diverse temporal scales. Observations are corrupted by Gaussian noise at levels σ = {0.005, 0.05, 0.15}. Real dataset experiments use EuRoC MAV visual-inertial trajectories with noisy ground truth for performance testing. Monte Carlo simulations average over 500 trials per noise level.
Architecture/Algorithm: A bank of causal linear FIR derivative filters is constructed, each performing least-squares polynomial fitting of degree p ≥ n over varying window lengths N. Filters produce estimates of the n-th derivative at the window's right edge. SURDE computes at each time step a Stein’s unbiased risk estimator (SURE) based cost c_k(N) for each candidate window length, which unbiasedly estimates the MSE of that filter’s output. Two variants are proposed:

Hard combining: select the window length minimizing c_k(N) directly.
Soft combining: weights estimates using exponential weights w(N) ∝ exp(−c_k(N)/T) with temperature T chosen analytically based on variance bounds to balance bias and smoothing. The key novelty is deriving a closed-form unbiased MSE estimate (Eq. 9) for windowed FIR derivative filters, enabling direct risk minimization rather than heuristic thresholding.

Training Regime: Not applicable, as this is an online adaptive filtering algorithm. Pre-computation includes calculation and storage of filter coefficients v(N), auxiliary s-vectors, and covariance projections. Per-sample computation entails applying filter convolutions and evaluating costs for each candidate window length, followed by either choosing or weighting estimates.
Evaluation Protocol: Metrics include per-sample MSE comparing estimated derivatives against noise-free ground truth across signal segments. Baseline methods include fixed-length LS filters, the ICI rule, AWVE, a constant-velocity Kalman filter, total-variation differentiation, and non-causal Savitzky–Golay. Simulations vary noise levels and assess statistical significance. Real-data evaluations check robustness and generalization. Ablations include noise variance misspecification and computational cost comparison.
Reproducibility: The paper provides algorithmic pseudocode (Algorithms 1 and 2) and closed-form formulae for key steps. No code release is mentioned. Method is fully specified and can be implemented with standard linear algebra tools. Synthetic data generation is described. The EuRoC MAV dataset is publicly available.

Technical innovations

Derivation of a closed-form, per-sample unbiased MSE estimator for causal FIR derivative filters using Stein’s Unbiased Risk Estimator (SURE), enabling principled data-driven adaptive filter length selection.
Formulation of a soft-combining scheme that exponentially weights candidate filter outputs using the SURE-based costs, with a closed-form optimal temperature parameter minimizing risk.
Proof of a minimax-optimal oracle inequality bounding the excess risk of the adaptive estimator to O(σ²√log K), outperforming prior multiplicative bounds like those for the ICI rule.
Computational complexity reduction by evaluating the SURE cost in O(N) per candidate window length, compared to O(N²) for comparable residual-based adaptive bandwidth selectors like AWVE.

Datasets

Synthetic multi-frequency sinusoid signals — 600 samples per trial, 500 Monte Carlo trials, Gaussian noise added — generated by authors
EuRoC MAV Dataset — Real robotic visual-inertial trajectories — publicly available

Baselines vs proposed

ICI rule: overall MSE at σ=0.05 = 6.00×10⁻³ vs SURDE soft-combining = 5.52×10⁻³
Adaptive Windowing Velocity Estimator (AWVE): overall MSE at σ=0.05 = 21.9×10⁻³ vs SURDE = 5.52×10⁻³
Fixed LS filter N=4: overall MSE at σ=0.05 = 5.42×10⁻³ vs SURDE = 5.52×10⁻³ (close)
Kalman filter: overall MSE at σ=0.05 = 2.57×10⁻³ vs SURDE slightly higher but SURDE better at higher noise
Noise variance misspecification (4× range): AWVE performance degrades by 222%, ICI by 36%, SURDE by only 9%

Limitations

SURDE requires accurate or at least rough knowledge of noise variance; performance depends on this parameter, though it is robust to moderate misspecification.
The method assumes Gaussian noise; applicability to non-Gaussian or heavy-tailed noise distributions is not tested or guaranteed.
No explicit adversarial robustness analysis; performance under active or structured noise/anomalies is not studied.
Evaluation on real datasets is limited to EuRoC MAV trajectories; broader validation on diverse real-world signals from other domains would strengthen claims.
The proposed SURE cost and oracle guarantees are derived for linear FIR filters; extension to nonlinear estimators is non-trivial and not addressed.
The approach focuses on first-derivative estimation; extension and empirical validation for higher-order derivatives is suggested but limited in experiments.

Open questions / follow-ons

How does SURDE perform under non-Gaussian noise or transient artifacts beyond stationary Gaussian assumptions?
Can the SURE-based adaptive framework be extended to nonlinear or state-space derivative estimators?
What are the tradeoffs and potential gains from jointly adapting polynomial order p alongside window length N?
How will SURDE behave in multi-dimensional signal settings or with vector-valued derivatives, e.g., spatio-temporal data?

Why it matters for bot defense

Adaptive derivative estimation is crucial in any real-time signal processing pipeline facing noisy inputs — a common scenario in bot detection signals derived from mouse/keyboard dynamics, touch gestures, or sensor streams in CAPTCHA and fraud detection. SURDE provides a rigorous method to dynamically adjust smoothing parameters based on unbiased risk estimates, which can improve the accuracy and responsiveness of velocity or acceleration features used in behavioral biometric bot defense models. Its causal, computationally efficient design suits deployment in low-latency environments where adaptive noise conditions prevail. Moreover, SURDE’s robustness to noise variance misspecification reduces the need for fine calibration, beneficial in practical web-scale CAPTCHA deployments facing diverse device noise profiles. However, the method assumes Gaussian noise and linear filters, so bot-defense engineers should carefully validate on real interaction data to check noise distribution conformity and explore possible model extensions. Overall, SURDE offers a promising principled alternative to heuristic derivative smoothing commonly used in CAPTCHA feature extraction and could lead to more stable and adaptive bot detection signals.

Cite

bibtex

@article{arxiv2606_09829,
  title={ Adaptive Derivative Estimation via Stein's Unbiased Risk },
  author={ Yonathan Murin and Ali Ozer Ercan },
  journal={arXiv preprint arXiv:2606.09829},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.09829}
}

Adaptive Derivative Estimation via Stein's Unbiased Risk ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​