Online Bayesian Calibration under Gradual and Abrupt System Changes
Source: arXiv:2605.06612 · Published 2026-05-07 · By Yang Xu, Chiwoo Park
TL;DR
This paper addresses the challenge of performing online Bayesian calibration of computer simulators in dynamic, nonstationary environments typical of digital twin applications. Classical Bayesian calibration techniques are offline and assume stationary data-generating processes, which limits their applicability when systems experience both gradual drifts and abrupt regime shifts over time. The authors propose Bayesian Recursive Projected Calibration (BRPC), a novel online framework that separates parameter updates from systematic bias (discrepancy) learning by first performing a discrepancy-free particle filtering update of calibration parameters followed by a conditional Gaussian process update of the discrepancy. This design preserves parameter–discrepancy identifiability and allows for principled bias-aware adaptation under temporal nonstationarity. To handle abrupt system changes, BRPC incorporates restart mechanisms driven by change-point detection criteria to reset calibration states and prevent bias accumulation from outdated data. The paper rigorously analyzes the theoretical tracking guarantees of BRPC under gradual drift and characterizes the statistical behavior of restart rules. Empirically, BRPC and its restart-augmented variants outperform sliding-window Bayesian calibration and data assimilation baselines in complex synthetic and plant-simulation digital twin benchmarks, achieving lower parameter estimation and prediction errors alongside accurate regime change detection.
Key findings
- BRPC reduces parameter estimation RMSE (θ-RMSE) by an order of magnitude compared to sliding-window BC(80) (e.g., 0.014 vs 0.142 in drifting synthetic benchmark).
- BRPC consistently lowers response prediction RMSE relative to BC(80) and data assimilation (DA), e.g., 0.484 vs 0.622 and 1.909 on synthetic drifting data.
- Restart-augmented BRPC variants (B-BRPC-RRA) achieve near-perfect recall and high precision (~0.986 recall, 0.999 precision) for detecting abrupt regime shifts.
- C-BRPC achieves comparable parameter RMSE to B-BRPC but with fewer restart events and improved restart event precision.
- Theoretical tracking bound (Theorem 1) ensures cumulative residual loss relative to any gradual reference path is controlled under stable discrepancy propagation.
- KL-regularized parameter updates (Eq.6) and conditional Gaussian process discrepancy posterior updates (Eq.11) preserve parameter–discrepancy identifiability online.
- Restart mechanisms based on BOCPD and CUSUM-style score testing control false alarm rates and detection delay, with tuning parameters influencing sensitivity.
- Empirical results demonstrate robustness across different modes of nonstationarity: drifting, sudden regime shifts, and mixed scenarios.
Threat model
The considered threat model involves the unknown and time-varying physical system dynamics that may drift gradually or shift abruptly, causing simulator mismatch and nonstationarity. The adversary is the natural environmental and operational variability causing distributional changes over time. There is no attacker attempting to actively deceive the calibration system or inject false data. The calibration method assumes honest observations corrupted only by Gaussian noise and system evolution which it must track online. Abrupt regime changes represent challenges rather than adversarial attacks.
Methodology — deep read
The paper tackles online Bayesian calibration under nonstationary and biased simulators. The threat model assumes that the underlying physical system and its simulator evolve over time with unknown time-varying calibration parameters θ_t and discrepancy δ_t(x), which must be estimated from streaming noisy observations of the physical system output y_t = simulator output + discrepancy + noise. The adversary is effectively the unknown environment changes; the method aims to track these changes but does not consider adversarial manipulation.
Data originates as sequential batches B_t of input-output pairs (X_t, Y_t) from either synthetic mathematical functions or a real plant simulation digital twin benchmark. Batches vary in size, and multiple nonstationarity scenarios are constructed: drifting ω_t varying smoothly, sudden abrupt step changes, and mixed combinations. Public datasets are unavailable; the synthetic and plant-simulation benchmarks are custom.
The core contribution is Bayesian Recursive Projected Calibration (BRPC), an online recursive filtering algorithm. BRPC separates updates of calibration parameters and discrepancy:
- The calibration parameters θ_t are defined via projected calibration as the minimizer of L2 discrepancy between physical response and simulator output integrated over input distribution. Their posterior is approximated by particles using a transition prior p(θ_t|θ_{t−1}) and a discrepancy-free likelihood p_proj(Y_t|X_t, θ_t). The update is a KL-regularized optimization solved by weighted particle filter resampling.
- Conditional on θ_t, the discrepancy function δ_t(x) is modeled as a Gaussian process whose coefficients are updated using a linear-Gaussian Bayesian update in particle-specific residual spaces with tempering parameter η_δ controlling adaptation speed.
By conditioning discrepancy on θ_t posterior particles, this two-stage update mitigates parameter–discrepancy confounding and preserves identifiability in the online setting.
To handle abrupt regime shifts, three restart mechanisms are integrated:
- B-BRPC uses Bayesian online change point detection (BOCPD) maintaining a mixture of restart hypotheses and triggers restarts based on posterior weights exceeding thresholds.
- C-BRPC monitors a CUSUM-like score computed from the negative log predictive likelihoods and triggers restarts when this exceeds a threshold, supporting more computationally efficient single-expert deployment.
- B-BRPC-RRA enhances B-BRPC by re-fitting the discrepancy offline after restarts from residuals computed using anchored calibration parameters to improve discrepancy estimation stability.
Theoretical guarantees establish a cumulative loss bound on discrepancy tracking under stable propagation assumptions (Assumption 1 and Theorem 1), and false-alarm/detection delay bounds for C-BRPC’s wCUSUM restart detection. Interaction effects between discrepancy adaptation and restart detection sensitivity are analyzed, highlighting tradeoffs in predictive sharpness versus adaptation speed.
Evaluation comprises repeated trials (25 runs synthetic, 10 plant simulation) measuring model parameter RMSE, response prediction RMSE, number of restarts, precision and recall of detected restart events vs ground truth change points with 2-batch tolerance. Baselines include sliding-window Bayesian calibration (BC(80)) and a data assimilation method (DA). Ablations analyze discrepancy estimation modes, restart parameters, and high-dimensional performance (appendices).
In a detailed synthetic example, BRPC particles are propagated using a transition kernel; upon observing new data batch, particle weights update according to discrepancy-free likelihood, followed by conditional Gaussian process updates of discrepancy using weighted residuals. Restart triggers based on BOCPD or CUSUM score then reset or continue the process dynamically.
Code release and replication details are not explicitly stated and dataset/code availability is unclear from the truncated text.
Overall, this method extends offline projected calibration to a theoretically principled online Bayesian recursive filtering scheme well suited for digital twin streaming calibration under complex temporal dynamics.
Technical innovations
- Extension of projected Bayesian calibration to an online sequential setting via separate discrepancy-free particle update for calibration parameters followed by conditional Gaussian process updates for model discrepancy.
- KL-regularized recursive Bayesian updates that balance adaptation speed and temporal smoothing to maintain tracking guarantees under gradual system changes.
- Integration of restart mechanisms coupling Bayesian online changepoint detection and CUSUM-like scoring with BRPC’s predictive posterior to dynamically reset calibration states upon abrupt regime shifts.
- Identification and management of the critical interaction between discrepancy learning adaptation and restart sensitivity, including residual re-anchoring to improve restart detection reliability.
- Theoretical bounds characterizing cumulative tracking loss under stable discrepancy propagation and probabilistic false alarm/detection delay guarantees for restart decisions.
Datasets
- Synthetic streaming benchmark — 25 runs — custom benchmark based on sinusoidal functions with time-varying parameters
- Plant simulation benchmark — 10 runs — discrete-event factory simulation digital twin, proprietary
Baselines vs proposed
- BC(80) sliding-window calibration: θ-RMSE = 0.142 ± 0.010 (synthetic drifting) vs BRPC: 0.014 ± 0.002
- DA data assimilation: Response RMSE = 1.909 ± 0.116 (synthetic drifting) vs BRPC: 0.484 ± 0.225
- B-BRPC: Precision@2 = 0.314, Recall@2 = 0.926 (Sudden 3 synthetic) vs B-BRPC-RRA: Precision@2=0.986, Recall@2=0.999
- C-BRPC restarts fewer times (e.g., 3.7 vs 10.3 in Sudden 3 synthetic) with higher restart precision (0.725 vs 0.314) than B-BRPC
- Plant-simulation Sudden 5: DA θ-RMSE = 4.259 ± 0.238 vs B-BRPC 0.957 ± 0.086, Response RMSE 1.000 vs 0.164
- Mixed scenarios show BRPC variants maintain superior parameter and response accuracy over baselines with varying restart performance
Figures from the paper
Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.06612.

Fig 1: Digital-twin environment for the bicycle-production plant-simulation benchmark. The

Fig 2: EnKF sensitivity heatmaps for θ-RMSE on the synthetic SLOPE, SUDDEN, and MIXED

Fig 3 (page 42).

Fig 4 (page 42).

Fig 5 (page 42).

Fig 6 (page 42).

Fig 7 (page 42).

Fig 8 (page 42).
Limitations
- Codebase and full reproducibility details, including seeded experiments and dataset access, are not explicitly provided.
- Experiments are limited to relatively low-dimensional calibration targets; scalability to very high-dimensional digital twins is not yet demonstrated.
- Restart mechanisms, especially B-BRPC-RRA, have increased computational costs due to offline residual refitting after restarts.
- The restart approaches rely on tuning of hyperparameters (thresholds, drift allowances) which may require domain expertise or cross-validation in practical deployment.
- No adversarial robustness analysis performed; performance under intentionally manipulated or corrupted data streams is unknown.
- Evaluation is limited to synthetic and one plant simulation benchmark; real-world industrial digital twin testing is pending.
Open questions / follow-ons
- How to scale BRPC to very high-dimensional calibration problems common in complex digital twins with many latent parameters?
- Can restart mechanisms be further improved to provide stronger formal false-alarm rate guarantees and more adaptive memory reset strategies without degrading detection delay?
- How does BRPC perform under non-Gaussian noise, non-stationary observation noise variance, or corrupted/missing data streams?
- Can BRPC’s two-stage decoupling of parameter and discrepancy updates be extended to other model discrepancy representations or non-Gaussian process models?
Why it matters for bot defense
For bot-defense and CAPTCHA practitioners, BRPC presents a principled online Bayesian calibration framework well suited for digital twin scenarios where model drift and abrupt changes degrade emulator accuracy over time. Although not directly related to CAPTCHA-breaking or bot detection, the techniques of recursively updating calibration parameters separately from systematic bias, coupled with restart mechanisms to detect abrupt shifts, provide useful conceptual tools for online model adaptation in nonstationary adversarial environments. For CAPTCHA Learning and Adaptation (LA), ensuring accurate calibration of behavioral or interaction models over time without mixing parameter shifts and model bias is crucial to maintaining detection fidelity. Furthermore, explicit restart or reset controls based on predictive likelihood evidence could be adapted to trigger retraining or model refresh when bot strategies abruptly shift, improving robustness to concept drift. The paper’s detailed theoretical guarantees and empirical validation under gradual and sudden change scenarios offer valuable insights for designing reliable, adaptive security models that continuously recalibrate on streaming data while detecting regime changes. Overall, BRPC enriches the design space for online calibration approaches that might be integrated into evolving CAPTCHA and bot-detection pipelines where resilience to distribution shift and nonstationarity is critical.
Cite
@article{arxiv2605_06612,
title={ Online Bayesian Calibration under Gradual and Abrupt System Changes },
author={ Yang Xu and Chiwoo Park },
journal={arXiv preprint arXiv:2605.06612},
year={ 2026 },
url={https://arxiv.org/abs/2605.06612}
}