Rapid co-design of Buoyancy-assisted robots for Challenging Locomotion using Gaussian Evolutionary Specialists

Source: arXiv:2606.07424 · Published 2026-06-05 · By Ankit Sinha, Nitish Sontakke, Dennis Hong, Yusuke Tanaka, Sehoon Ha

TL;DR

This paper addresses the computational bottleneck in robot co-design, the joint optimization of robot morphology and control policies for locomotion tasks. Existing model-free Reinforcement Learning (RL) methods require costly per-design policy training, making design exploration expensive. To overcome this, prior universal policies conditioned on design parameters offered zero-shot evaluation but suffer from behavioral diversity collapse, converging to a single suboptimal locomotion strategy across morphologies. End-to-end Mixture-of-Experts (MoE) methods also fail due to representation collapse where experts homogenize. The authors propose Gaussian Evolutionary Specialists (GES), a framework that decouples design space partitioning from policy learning. GES assigns specialists to evolve Gaussian-parameterized regions of the design space and iteratively refines these territories based on specialist performance, preventing homogenization and capturing diverse behaviors explicitly.

Evaluated on the buoyancy-assisted BALLU bipedal robot platform, GES discovers morphologies yielding 5-25% higher locomotion performance than naive universal policies on robust obstacle traversal and ramp climbing tasks. Hardware testing confirms a GES-optimized design can clear a 24 cm obstacle (3× baseline). GES also reduces design optimization time by 37% compared to retraining policies for each morphology. Overall, GES effectively mitigates diversity collapse, achieves multi-modal specialist policies, and accelerates co-design with zero-shot evaluations.

Key findings

GES reduces design optimization time by 37% compared to per-design RL training on the 3D design space.
On the obstacle traversal task in 2D, GES achieves 36.9 cm max obstacle clearance, surpassing e2e MLP (29.4 cm) by 25.5% mean and e2e MoE (34.0 cm) by 8.62%.
On the 3D obstacle task, GES clears 54 cm obstacles vs 41 cm for e2e MLP, a peak gain of 31.7%, with 18.8% mean improvement over e2e MLP.
GES wins head-to-head against e2e MLP on 94% (2D) and 68% (3D) of designs tested for obstacle traversal.
For ramp climbing in 2D, GES achieves 31.4° slope ascent vs 27.9° by baselines, an 11.2% peak gain and 13% mean improvement.
e2e MoE suffers ~20% performance degradation as design space dimensionality increases due to representation collapse.
Hardware tests confirm GES-optimized BALLU clears 24 cm obstacles vs baseline design's 8 cm, a 3× improvement.
GES avoids behavioral diversity collapse by hard-assigning design-space Gaussian territories per specialist, enabling multi-modal locomotion strategies.

Threat model

The paper addresses the challenge of an implicit adversary: the behavioral diversity collapse arising in universal and MoE policies trained over heterogeneous robot morphologies, which causes all policies to converge to a single suboptimal controller that fails to differentiate and evaluate diverse designs. This 'adversary' is the gradient averaging and representation collapse phenomena within reinforcement learning setups; no human adversary or attacker is considered.

Methodology — deep read

The paper addresses the co-design problem of jointly optimizing robot morphology (physical design parameters) and control policy to maximize locomotion performance on tasks like obstacle traversal and ramp climbing. The inherent bi-level optimization setup has an inner loop training a control policy for each morphology, and an outer loop searching design space, which is computationally expensive.

Threat model assumes a simulator environment (Isaac Sim 4.5) modeling BALLU robot dynamics including complex nonlinear aerodynamics; the adversary is the diversity collapse in multi-task RL policies across morphologies. No adversarial attacker.

Data consist of morphology parameter sets sampled from 2D or 3D continuous bounded design spaces: (GCR, SPCF) or (GCR, SPCF, leg length). Design parameters are normalized and bounded (e.g. femur and tibia lengths 0.25 to 0.55 m, gravity compensation ratio 0.72 to 0.90). Data splits are not explicitly described but performance is evaluated on 1000 held-out designs for testing.

Architecture: Control policies are Gaussian MLPs with [128,64,32] layers, outputting joint targets. Policies trained by PPO on augmented states (state + morphology) with morphology input conditioned. Universal policies are trained monolithically across all morphologies; mixtures-of-experts (MoE) have multiple such policies gated by a learned router network.

Gaussian Evolutionary Specialists (GES) introduce K specialists assigned to Gaussian regions (each defined by mean and diagonal covariance) in design space rather than a learned router. GES cycles through:

Initialization of K Gaussian centers via farthest point sampling and Lloyd relaxation to spread centers for coverage.
Iterative territory evolution repeating: - Training each specialist policy on samples drawn from its Gaussian region via PPO (resuming checkpoints). - Probing sampled designs near Gaussian borders evaluated by all specialists. Each design assigned competitively to the specialist with highest score. - Re-fitting the Gaussian parameters of each specialist's territory to the winning design set (mean and diagonal covariance). This continues until Monte Carlo coverage across design space reaches 95%. This stable geometric partitioning avoids router collapse in MoE.

Training is run in Isaac Lab simulator at 200Hz physics with control at 20Hz, PPO training steps per iteration unspecified but sufficient. Hyperparameters include number of training samples per iteration and probe samples per border.

Evaluation: Performance measured using tasks of ramp climbing (max slope) and obstacle traversal (max obstacle height cleared), with reward functions defined by navigation distance, forward velocity, and toe clearance. Baseline comparisons are monolithic end-to-end MLP policies and end-to-end MoE policies trained on design-randomized data.

Extensive held-out tests on 1000 unseen morphologies evaluate performance distributions and head-to-head comparisons. Hardware validation uses teleoperated version of optimized morphology to isolate design impact from policy sim-to-real transfer.

Reproducibility: The paper does not explicitly mention code or dataset release. Simulators and hardware platform BALLU are publicly referenced. Details are sufficient to replicate with standard RL and Gaussian mixture frameworks, but no exact code release stated.

Concrete example: For K=3 specialists in 2D space, GES initializes Gaussian centers with FPS+Lloyd, then iteratively trains specialists, probes border samples competing for assignment, refits territories reflecting winning sets. Territories evolve from small disparate regions to collectively tile the space with distinct local locomotion strategies. Testing specialists on the same design but different territories produces qualitatively distinct foot trajectories confirming multi-modal behaviors. The BO co-design loop uses GES zero-shot evaluation by assigning proposed designs to the nearest specialist Gaussian and scoring without retraining, accelerating morphology search.

Technical innovations

Identification and formal characterization of behavioral diversity collapse as the failure mode where universal policies trained across morphologies collapse to a uni-modal strategy.
Gaussian Evolutionary Specialists (GES) framework that decouples design-space partitioning from policy learning using iterative Gaussian territory evolution to maintain specialization.
Competitive probing and territory refitting mechanism ensures that specialists expand or contract Gaussian coverage based on relative policy performance, avoiding MoE representation collapse.
Use of a geometry-based density router for zero-shot morphology-conditioned policy evaluation within Bayesian optimization, eliminating costly inner-loop retraining.

Datasets

BALLU morphology design space — 1000+ sampled designs for evaluation — synthetic/simulated
No publicly released dataset but simulation environment and parameter bounds are detailed.

Baselines vs proposed

End-to-end monolithic MLP policy: mean obstacle traversal = 29.4 cm (2D), vs GES 36.9 cm
End-to-end MoE: mean obstacle traversal = 34.0 cm (2D), GES 36.9 cm
End-to-end monolithic MLP: mean ramp climbing slope = 27.9° (2D), GES 31.4°
End-to-end MoE: mean ramp climbing slope = 28.4° (2D), GES 31.4°
In 3D design space, GES outperforms e2e MLP on obstacle traversal mean 29.1 cm vs 24.5 cm
GES wins head-to-head 94% vs e2e MLP and 72% vs e2e MoE in 2D obstacle traversal

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2606.07424.

Fig 1

Fig 1: (a) A BALLU robot walking up a ramp inclined at

Fig 2

Fig 2: BALLU Hardware

Fig 3

Fig 3: GES: Our proposed algorithm for training a mixture

Fig 4

Fig 4: (a) Performance distribution of GES vs baselines in

Fig 5

Fig 5: Expert contribution distribution for 4 experts across

Fig 6

Fig 6: Left and right toe (z) trajectories of 3 GES specialists

Fig 7

Fig 7: Evolution of 6 specialists in the 3D design space(spring coefficient, buoyancy, and symmetric leg length)

Fig 8

Fig 8: GES performance as a function of the number of

Limitations

Fixed number of specialists K — sensitivity to this hyperparameter impacts coverage and training balance; adaptive K not explored.
Evaluations restricted to a single buoyancy-assisted bipedal platform (BALLU); generalization to other morphologies and robot types untested.
Assumes continuous, bounded design space and Gaussian territory shapes; discrete or unbounded parameters would require new methods.
Hardware results are limited to teleoperated validation isolating morphology effects; full autonomous policy sim-to-real transfer not demonstrated.
MoE baseline collapse observed but other advanced gating or routing methods not explored as comparisons.
Complete code and dataset release not mentioned, limiting reproducibility.

Open questions / follow-ons

Can GES be extended to dynamically add or merge specialists based on performance or coverage metrics to reduce sensitivity to the fixed K parameter?
How does GES generalize to higher-dimensional or discrete morphology parameter spaces, or to more complex robot platforms like quadrupeds or humanoids?
Can the GES approach incorporate online adaptation or meta-learning to simultaneously optimize control and morphology in real-time hardware deployment?
What are the implications of GES when coupled with more advanced policy architectures (e.g., Graph Neural Networks or Transformer-based controllers) and richer task domains?

Why it matters for bot defense

The findings in this paper are directly relevant to bot-defense and CAPTCHA practitioners interested in co-adapting system design and control policies efficiently under multi-modal behavioral distributions. The behavioral diversity collapse phenomenon identified parallels collapse issues in multi-task or multi-modal ML models common in bot recognition or human challenge tasks. The GES framework’s explicit partitioning and specialization approach could inspire CAPTCHA challenge adaptation strategies that maintain diverse challenge modalities without collapsing to single trivial modes.

Moreover, the zero-shot evaluation capability of GES specialists accelerating optimization reflects the need in bot defense to rapidly assess new challenge configurations without expensive retraining or manual tuning. The iterative territory refinement and competitive assignment procedure highlight principled methods for managing multiple expert models or challenge types, which may inform CAPTCHA lifecycle management where adversarial evolution and varied user interactions complicate challenge design.

Cite

bibtex

@article{arxiv2606_07424,
  title={ Rapid co-design of Buoyancy-assisted robots for Challenging Locomotion using Gaussian Evolutionary Specialists },
  author={ Ankit Sinha and Nitish Sontakke and Dennis Hong and Yusuke Tanaka and Sehoon Ha },
  journal={arXiv preprint arXiv:2606.07424},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.07424}
}

Rapid co-design of Buoyancy-assisted robots for Challenging Locomotion using Gaussian Evolutionary Specialists ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​