Free Parametrization of L_2-Bounded Structured State-Space Controllers for Nonlinear Control with Stability Guarantees
Source: arXiv:2606.11049 · Published 2026-06-09 · By Muhammad Zakwan, Leonardo Massai, Efe C. Balta, Giancarlo Ferrari-Trecate
TL;DR
This paper tackles the critical challenge in nonlinear control of designing stabilizing controllers that can optimize complex objectives without sacrificing closed-loop stability. Neural network controllers often destabilize nonlinear systems due to unconstrained parameter optimization. The authors leverage Structured State-Space Models (SSMs) to propose a novel free parametrization of discrete-time Linear Time-Invariant (LTI) systems with prescribed L2 gain bounds. This leads to the L2-Recurrent Unit (L2RU), an SSM layer whose L2 gain is guaranteed by construction, enabling unconstrained optimization of nonlinear controller objectives while maintaining stability through the small-gain theorem. The approach significantly reduces computational overhead traditionally required to enforce stability constraints and is efficiently parallelizable for long input sequences.
The paper demonstrates the method on a multi-robot formation control task with obstacle and collision avoidance, showing stable closed-loop behaviors and effective performance. The experimentally trained L2RU controllers satisfy tight L2-gain bounds derived from plant analysis, proving that the free parametrization does not reduce controller expressivity or capability. This confirms the potential of L2RU architectures as scalable, theoretically sound building blocks for stable nonlinear control with modern learning techniques.
Key findings
- Introduced a free parametrization κγ of discrete-time LTI systems guaranteeing L2-gain bounded by γ for all parameter values ω, enabling unconstrained optimization.
- The complex-diagonal parametrization of A matrix allows parallel-scan-based inference, improving computational efficiency over prior dense parametrizations (Massai and Ferrari-Trecate, 2025).
- Closed-loop stability of nonlinear systems guaranteed by design via the small-gain theorem if γˆγ < 1, where ˆγ is the plant L2-gain computed by semi-definite programming (18.573 in robotic swarm example).
- L2RU controllers with 3 layers, nh=11 states, nd=nz=12, and γ = 1/(18.573+0.001) successfully stabilized a nonlinear multi-robot system with collision and obstacle avoidance.
- Imposing the L2 bound on controllers does not noticeably degrade control performance or expressive capacity in the simulated task (Figures 2 and 3).
- The L2RU forward pass is highly parallelizable and more efficient than Neural ODE-based or REN approaches that require implicit solves or sequential passes.
- Parametrization supports model sizes and input/output dimensions beyond previous square system restrictions, increasing practical applicability.
- The approach decouples stability enforcement from nonlinear objective optimization, simplifying controller training compared to constrained/stability projection methods.
Threat model
The adversary is the nonlinear dynamics of the plant that can cause closed-loop instability if the controller is not properly constrained. The plant is assumed L2-bounded with a known or estimated gain. The method cannot defend against adversarial inputs beyond the L2-gain assumptions or models with unknown/unbounded nonlinearities. The main threat is uncontrolled destabilization due to unconstrained controller parameter optimization.
Methodology — deep read
Threat Model & Assumptions: The adversary in this context is the nonlinear system itself causing potential instability when controlled by unconstrained NN policies. The plant is assumed to have a known or estimable finite L2-gain ˆγ. Stability is guaranteed by ensuring the controller’s L2-gain γ satisfies the small-gain theorem (γˆγ < 1). The approach assumes well-posed closed-loop interconnections and that the plant’s L2-gain can be bounded conservatively.
Data: The main demonstration uses a nonlinear discretized wheeled robot from the Robotarium platform, modeling state x = (position, orientation) and control inputs (linear and angular velocities). Multiple robots form a swarm with decoupled dynamics where the overall L2-gain equals that of individual agents (18.573). The initial states (20 samples) form a dataset over which the empirical cost is averaged.
Architecture / Algorithm: The L2RU consists of r=3 stacked State-Space Layers (SSLs), each comprising a discrete-time LTI system parametrized freely to guarantee an L2 bound, followed by a Lipschitz-bounded nonlinear activation (e.g., deep MLP with Lipschitz constraints). The LTI system parameters (A,B,C,D) are generated by the novel free parametrization κγ(ω), encoding a complex-diagonal A matrix to enable parallel scan computations. An encoder and decoder linearly transform inputs and outputs. The L2-gain constraint is enforced by construction, allowing the nonlinearities and linear blocks to be combined without explicit stability constraints.
Training Regime: The controller is trained with Adam optimizer at 1e-3 learning rate and uses backpropagation through time over finite horizons to minimize composite costs including goal tracking, input effort, collision avoidance, and obstacle avoidance. Hyperparameters for layer sizes (nh=11, nd=nz=12), Lipschitz constants, and γ are set to satisfy the small-gain theorem with a small margin (ε=0.001). Training samples cover multiple initial states (S=20), supporting generalization.
Evaluation Protocol: Stability is certified by the small-gain theorem analytically using the computed L2-gain bound of the plant. Performance is evaluated empirically by analyzing agent trajectories, collision occurrences, and objective scores (illustrated in Figures 2 and 3). The baseline is an unconstrained controller with no explicit L2 bounds, and ablations on parametrization structure were discussed though detailed numerical ablations are in the extended report.
Reproducibility: Code for L2RU implementation is publicly released at https://github.com/DecodEPFL/SSMs-for-Control/tree/master. Detailed proofs and parametrization maps are given in the accompanying technical reports (Massai et al., 2025). The Robotarium platform example is a standard benchmark with well-defined dynamics. No closed dataset restrictions apply.
Concrete Example: For the robot swarm task, the plant L2-gain is estimated as 18.573 via SDP. The controller LTI parametrization κγ is set with γ just below 1/ˆγ. Adam training minimizes a nonlinear cost over sampled initial states with the L2RU architecture. Post-training, robot trajectories demonstrate collision-free navigation while respecting stability bounds guaranteed by the architecture design and the small-gain theorem.
Technical innovations
- A free parametrization κγ of discrete-time LTI systems with guaranteed L2-gain γ by construction, allowing unconstrained parameter optimization.
- Use of complex-diagonal matrix A parametrization for L2RU enabling efficient, highly parallelizable forward passes via parallel scan algorithms.
- Extension of prior L2-bounded SSM parametrizations from square systems to general rectangular systems with arbitrary input/output/state dimensions.
- Integration of free L2RU parametrization with the small-gain theorem to guarantee closed-loop stability of nonlinear systems under unconstrained nonlinear controller optimization.
Datasets
- Robotarium wheeled robot swarm simulation — 20 initial states sampled for training — simulation platform example
Baselines vs proposed
- Controller without L2 bound: unstable closed-loop observed vs L2RU controller with γ=1/(18.573+0.001): stable, collision-free trajectories
- Dense matrix parametrization from Massai and Ferrari-Trecate (2025): slower simulation time vs complex-diagonal L2RU parametrization with parallel scan: faster by significant margin (quantitative speedup in extended report)
- Unconstrained nonlinear controller optimization: non-trivial constrained training (nonconvex/QCQP) vs L2RU free parametrization: unconstrained gradient-based optimization viable
Limitations
- L2-gain bound of the plant must be known or conservatively estimated; inaccurate bounds may limit applicability or lead to overly conservative controllers.
- Parametrization relies on sufficient but not necessary conditions, so may not fully capture all L2-bounded LTI systems, potentially limiting controller expressivity in some edge cases.
- Experiments focus on simulation with a specific nonlinear robot swarm example; real-world validation and generalization to other nonlinear plants remain future work.
- Does not include adversarial or worst-case disturbance robustness evaluation beyond L2-gain framework.
- The small-gain theorem applies to feedback interconnections but may not capture all possible nonlinearities or unmodeled dynamics in practice.
- Initialization and hyperparameter choices may influence training convergence; guidelines provided but no automatic tuning demonstrated.
Open questions / follow-ons
- How to extend or adapt the L2RU free parametrization to explicitly incorporate robustness against adversarial disturbances or model uncertainties beyond L2-gain assumptions?
- What are the trade-offs between tightness of the L2-gain bound and controller expressivity in more complex nonlinear systems or real hardware?
- Can the L2RU parametrization be efficiently integrated with model-based reinforcement learning to enable stable learning of nonlinear control policies?
- How does the proposed parametrization generalize beyond discrete-time systems or interface with continuous-time control settings?
Why it matters for bot defense
For bot-defense and CAPTCHA applications involving nonlinear dynamical systems or recurrent neural architectures, this paper offers a principled method to enforce stability constraints during controller training by designing NN-based policies with guaranteed L2 gain bounds. This approach decouples stability from optimization, allowing unconstrained learning of complex objectives while ensuring robustness against input perturbations that could destabilize the system. The highly parallelizable L2RU architecture also enables efficient inference on long input sequences, relevant for real-time or large-scale verification tasks.
While not focused on traditional CAPTCHA schemes, the free parametrization techniques and stability guarantees provide foundational insights into constructing dependable recurrent neural components in adversarial or safety-critical contexts. Bot-defense engineers can leverage these methods to design stable, interpretable recurrent policies or filters, reducing risk of instability-related failure modes common in naive NN controllers. The link to small-gain theorem-based stability certificates is particularly useful for rigorous system verification in security-sensitive deployments.
Cite
@article{arxiv2606_11049,
title={ Free Parametrization of L_2-Bounded Structured State-Space Controllers for Nonlinear Control with Stability Guarantees },
author={ Muhammad Zakwan and Leonardo Massai and Efe C. Balta and Giancarlo Ferrari-Trecate },
journal={arXiv preprint arXiv:2606.11049},
year={ 2026 },
url={https://arxiv.org/abs/2606.11049}
}