Multi-Robot Coordination in V2X Environments

Source: arXiv:2605.06662 · Published 2026-05-07 · By John Pravin Arockiasamy, Alexey Vinel

TL;DR

This paper tackles a concrete gap in Connected, Cooperative, and Automated Mobility (CCAM) standardization: existing ETSI V2X facility-layer messages (CAM, VAM, CPM, MCM) were designed for vehicles and self-transmitting VRUs, and provide no semantic primitives for robots operating as active traffic mediators — entities that dynamically change roles, detect non-V2X pedestrians via onboard sensing, and must coordinate with heterogeneous peer robots without centralized infrastructure. The authors propose two new facility-layer services built on top of ETSI ITS-G5: the Robot Awareness Service (RAS), realized via the Robot Awareness Message (RAM), and the Robot Maneuver Coordination Service (RMCS), realized via the Robot Maneuver Coordination Message (RMCM). RAM extends CAM's container-based structure with robot-specific containers for operational role, task state, battery/health status, leader-follower relationships, and VRU cluster aggregation. RMCM provides event-driven bidirectional maneuver instructions and acknowledgments between leader and follower robots once roles are established through RAM.

The framework is validated on two fronts. First, a real-world proof-of-concept (POC) deploys a humanoid robot (PAL Robotics ARI) and a quadrupedal robot (Unitree Go2) in a controlled pedestrian-crossing scenario. Both robots run a formally specified five-state distributed finite-state machine (FSM) and communicate exclusively via IEEE 802.11p V2X on-board units. The coordination protocol proceeds through Idle, HelpRequested, RoleEstablished, ManeuverExecuting, and Termination states without any central controller or pre-pairing. Second, a SUMO + Artery simulation on a 3×3 Manhattan grid evaluates how robot-mediated VRU clustering affects pedestrian coverage and wireless channel load under three robot density configurations (0, 1, 9 robots) and two pedestrian population sizes, for both non-V2X and V2X-enabled pedestrian cohorts.

Results show that the real-world coordination achieves sub-150ms negotiation latency across N=5 trials, and that deploying 9 robots in the simulated grid can reduce mean Channel Busy Ratio (mCBR) by up to 16.3% relative to the no-robot baseline in V2X-enabled pedestrian scenarios, while covering approximately 17–18% of non-V2X pedestrian travel time at safety-critical intersections with a 15m sensing radius. The work is positioned as a standards-aligned extension to ETSI ITS-G5 and is accepted at IEEE ITSC 2026.

Key findings

Real-world negotiation latency (Tneg) for the initialPos maneuver was 0.074 ± 0.015 s across N=5 trials; for the pedestrian-escort Move maneuver it was 0.129 ± 0.102 s, confirming sub-150ms coordination overhead in both cases on IEEE 802.11p hardware.
Total maneuver coordination time (TMCT) for initialPos was 0.574 ± 0.015 s; for the Move (escort) maneuver it was 21.729 ± 0.102 s, where the large Texec component (≈21.60 s) reflects actual physical robot travel time, not protocol overhead.
In the V2X-enabled pedestrian simulation (Case 2), deploying 9 robots with a 15m sensing radius reduced mCBR by up to 16.3% relative to the no-robot baseline in the 50-pedestrian scenario (Fig. 8), demonstrating meaningful channel load reduction through VAM suppression.
In the non-V2X pedestrian simulation (Case 1), 1 robot with a 15m radius achieved approximately 5–6% pedestrian travel-time coverage (OBS metric) at intersections; 9 robots raised this to approximately 17–18%, confirming linear-ish scaling with robot count (Fig. 7).
Reducing sensing radius from 15m to 10m caused a proportional but consistent decrease in both OBS coverage and mCBR reduction across all pedestrian population sizes (20, 50, 100 pedestrians), indicating that the framework's benefits degrade gracefully with sensor range rather than collapsing.
The FSM fail-safe mechanism — where a follower transitions to a safe idle position and re-advertises via RAM if no RMCM command arrives within a safety timeout — provides bounded recovery without requiring centralized intervention, though quantitative timeout values are not specified in the available text.
The framework achieves backward ETSI ITS-G5 compatibility by treating robots as full ITS stations that receive CAM/VAM/CPM from surrounding participants, so no modifications to existing vehicle or VRU equipment are required.

Threat model

The paper is not a security-focused work, but implicitly assumes an operational threat model where the primary risks are communication loss (packet drop, timeout) and role ambiguity between robots. The adversary in the fail-safe design is treated as an environmental/channel failure, not a malicious actor. The framework inherits ETSI ITS-Sec mechanisms (authentication via PKI, message integrity, pseudonymized station IDs) to address impersonation and replay threats from external attackers, but the paper does not analyze what a compromised robot or a malicious RAM/RMCM injector could do to the coordination FSM. The VAM suppression mechanism — where a robot instructs nearby V2X pedestrians to stop transmitting — is a potential denial-of-service vector if a rogue robot broadcasts fake cluster-head RAM to silence legitimate VRU transmissions; this attack surface is not discussed.

Methodology — deep read

Threat model and assumptions: This is not primarily a security paper, but the adversary context is implicitly a communication-failure or role-ambiguity scenario in safety-critical urban traffic. The framework assumes all robots are authenticated ETSI ITS stations using ITS-Sec mechanisms (pseudonymized identifiers, integrity protection). It explicitly does NOT assume centralized infrastructure availability, prior robot pairing, or V2X capability in pedestrians or vehicles. The fail-safe assumption is that communication loss is detectable via timeout and handled by follower self-parking, not by a watchdog server.

Real-world POC — hardware and setup: Two heterogeneous robots were used: a humanoid (PAL Robotics ARI) and a quadruped (Unitree Go2). Each was equipped with an IEEE 802.11p dedicated short-range communication V2X on-board unit (OBU) from Herman. The experiment was conducted in a controlled indoor/outdoor environment without live vehicular traffic for safety and repeatability. N=5 trials were run per maneuver type. The paper does not specify indoor vs. outdoor, exact range between robots, or whether the V2X OBUs were configured to the standard ETSI ITS-G5 5.9 GHz channel — this is implied but not confirmed explicitly.

Coordination protocol — FSM design: Each robot independently executes a five-state distributed FSM: Idle → HelpRequested → RoleEstablished → ManeuverExecuting → Idle (termination). State transitions are triggered by message events (RAM with helpStatus fields, RMCM receipt and acknowledgment) rather than a shared clock or central orchestrator. The leader role is assumed by the pedestrian-detecting robot (ARI); follower role is taken by the responding robot (Go2). Maneuver completion is inferred by the leader from updated position/status fields in subsequent follower RAM, not from a dedicated completion RMCM — this is a deliberate design choice to avoid adding a sixth message type but introduces a polling-like dependency on periodic RAM freshness.

Message architecture: RAM follows ETSI CAM's container hierarchy. Mandatory containers: ItsPduHeader, GenerationDeltaTime, BasicContainer (position, station ID), RobotHighFrequencyContainer (kinematics). Optional containers: RobotLowFrequencyContainer, RobotStatusContainer (role, jobType, currentTaskStatus, healthStatus, operationMode, helpStatus), RobotLeaderFollowerOperationContainer (join/leave/breakup signaling), VruClusterInformationContainer, VruMotionPredictionContainer. RMCM has a CHOICE structure: either LeaderManeuverContainer (requestID, slaveAdviceList, followerAdviceList with jobAdvice/taskType) or FollowerManeuverContainer (jobAdviceID derived from leader's requestID, intersectionReferenceID, targetFollowerStationID, OperationMode). The distinction of requestID→jobAdviceID linkage enables the follower to match acknowledgments to specific leader commands even in broadcast environments.

Simulation setup: SUMO traffic simulator coupled with the Artery V2X networking stack on a 3×3 Manhattan grid (100m segments, sidewalks on both sides, crossings only at intersections). ETSI ITS-G5 PHY/MAC: IEEE 802.11p/EDCA, 5.9 GHz, 10 MHz channel, 200 mW TX power. Traffic mix: 20 vehicles (V2X-enabled, transmitting CAM) and 20/50/100 pedestrians. Robot deployment: 0 (baseline), 1 (central intersection 2,2), or 9 (one per intersection). Sensing radii tested: 10m and 15m. Crucially, pedestrian routes were randomly generated once and held fixed across all robot configurations to ensure controlled comparisons — this is good methodology for isolating the robot-deployment variable.

Evaluation metrics: Two primary metrics. (1) Observation Coverage Ratio (OBS) = cumulative time pedestrians spend inside any robot's observation zone / total pedestrian travel time × 100, used for Case 1 (non-V2X pedestrians). (2) Mean Channel Busy Ratio (mCBR) = time-averaged fraction of channel busy time over simulation duration, used for Case 2 (V2X-enabled pedestrians). For the real-world POC: Total Maneuver Coordination Time (TMCT) decomposed into Tneg (RMCM transmission to reception delta) and Texec (reception to ManeuverCompleted detection). Mean ± standard deviation reported over N=5 trials. No statistical significance tests (t-test, confidence intervals beyond ±std) are reported. No held-out adversarial communication scenario is tested.

Reproducibility: The paper does not mention code release, frozen weights (not applicable — no ML model), or public dataset release. The SUMO scenario configuration and Artery parameters are described at a level that would allow reconstruction but no repository link is provided. The real-world experiment used commercial hardware (Herman OBUs) that may not be universally accessible.

Technical innovations

RAM introduces an implicit VRU clustering strategy where the robot automatically acts as cluster head via onboard perception, bypassing the explicit VRU-initiated cluster negotiation required by ETSI VAM's VruClusterOperationContainer, thereby integrating non-V2X pedestrians who cannot self-transmit.
RMCM's CHOICE-typed container (LeaderManeuverContainer vs. FollowerManeuverContainer) with requestID→jobAdviceID chaining enables context-aware, role-asymmetric bidirectional maneuver acknowledgment without a centralized message broker, generalizing the infrastructure-assisted coordination pattern from prior work (cited as [17]) to fully decentralized robot pairs.
The RobotStatusContainer in RAM encodes task-level semantics (jobType, currentTaskStatus, helpStatus, operationMode) that are absent from CAM, allowing any ITS station — including existing vehicles — to reason about robot intent and capability without robot-specific firmware.
The five-state distributed FSM uses follower RAM updates (position, motion state, task progression) as implicit maneuver-completion signals rather than a dedicated RMCM completion message, reducing protocol overhead at the cost of requiring fresh periodic RAM to be available at the leader.
VAM suppression is applied conservatively and locally — only to V2X-enabled VRUs physically within the robot's observation zone — without modifying VAM semantics or requiring firmware changes on VRU devices, preserving backward compatibility while reducing channel load.

Datasets

SUMO + Artery synthetic simulation — 3x3 Manhattan grid, 20 vehicles + 20/50/100 pedestrians, 3 robot configurations × 2 sensing radii — generated by authors, not publicly released
Real-world POC trials — N=5 per maneuver type, 2 robots (ARI humanoid + Unitree Go2), controlled environment — collected by authors, not publicly released

Baselines vs proposed

No-robot baseline (Case 1, non-V2X pedestrians): OBS coverage = 0% vs. 1-robot RAS (15m radius): ~5–6%, 9-robot RAS (15m radius): ~17–18%
No-robot baseline (Case 2, V2X pedestrians, 50 ped.): mCBR = ~0.00311 vs. 9-robot RAS (15m radius): mCBR reduced by 16.3% (Fig. 8)
No-robot baseline (Case 2, V2X pedestrians, 100 ped.): mCBR = ~0.00591 vs. 9-robot RAS (15m radius): mCBR reduced by 13.7% (Fig. 8)
No-robot baseline (Case 2, V2X pedestrians, 20 ped.): mCBR = ~0.00238 vs. 9-robot RAS (15m radius): mCBR reduced by 7.2% (Fig. 8)
1-robot RAS (10m radius, 50 ped.): mCBR reduced by 4.2% vs. 9-robot RAS (10m radius, 50 ped.): mCBR reduced by 7.8% relative to no-robot baseline (Fig. 8)

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.06662.

Fig 6

Fig 6: Five-stage multi-robot coordination using V2X OBUs (white boxes) for pedestrian crossing assistance with RAM and RMCM.

Fig 7

Fig 7: Observation coverage ratio (OBS) for non-V2X pedes-

Fig 8

Fig 8: Mean channel busy ratio (mCBR) values from robot-

Fig 5

Fig 5: FSM governing decentralized multi-robot coordination

Limitations

Extremely small real-world trial count (N=5 per maneuver type) with only two robots and no live vehicular traffic; results establish proof-of-concept plausibility but cannot support statistical claims about reliability or latency distributions in actual urban deployments.
The simulation grid is static and idealized (fixed robot positions at intersections, no robot mobility, bounded sensing radius); the evaluation does not test dynamic robot repositioning, which is a core claimed advantage of mobile robots over fixed infrastructure.
No adversarial communication evaluation: the paper does not test behavior under packet loss rates, channel congestion, spoofed RAM/RMCM messages, or Sybil attacks, despite ITS-Sec compliance being listed as requirement R7.
Maneuver completion detection relies on follower RAM freshness at the leader; if RAM is lost or delayed under high channel load, the leader may incorrectly infer task state — the paper acknowledges this is compensated by subsequent broadcasts but provides no quantitative reliability bound for this inference mechanism.
OBS metric counts all pedestrian time inside observation zones equally, regardless of whether the pedestrian is actually in a safety-critical crossing situation; this inflates perceived coverage if robots detect pedestrians on sidewalks far from conflict points.
The framework's scalability argument for larger robot teams (3+ robots) is made through FSM design reasoning alone, with no simulation or empirical data for teams beyond two robots in the real-world scenario.
No comparison against alternative decentralized multi-robot coordination protocols (e.g., ROS 2 multi-agent frameworks, IEEE 802.15.4-based approaches) that might achieve similar coordination with lower standardization overhead.

Open questions / follow-ons

How does coordination latency and FSM reliability degrade under realistic ITS-G5 channel congestion (CBR > 0.6) with many simultaneous V2X participants, and what is the minimum RAM broadcast rate needed for the leader's implicit maneuver-completion inference to remain correct?
Can the VAM suppression mechanism introduced by robot cluster heads be exploited by a rogue or compromised robot broadcasting fraudulent RAM to silence legitimate VRU transmissions, and what ETSI ITS-Sec extension would mitigate this?
What is the optimal robot placement and mobility policy (static intersection coverage vs. dynamic patrol) for maximizing OBS coverage of non-V2X pedestrians across an entire urban grid, and how does this interact with channel load from RAM transmissions themselves?
How does the leader-follower FSM generalize to scenarios with more than two hierarchical levels or peer-to-peer role negotiation (e.g., two robots simultaneously detecting different pedestrians who need coordinated response), and does the current RMCM requestID/jobAdviceID scheme prevent command collision?

Why it matters for bot defense

At first glance this paper is far removed from bot-defense and CAPTCHA, but it is directly relevant to the broader problem of distinguishing and tracking non-credentialed, non-communicating entities (non-V2X pedestrians) in a mixed-agent environment — a structural analogy to the bot-defense problem of detecting and tracking non-credentialed browser agents in mixed human/bot traffic. The robot-as-cluster-head pattern, where a trusted, credentialed agent uses local perception to vouch for and aggregate the state of uncredentialed nearby entities, is architecturally similar to how a trusted client-side signal collector could aggregate behavioral signals from unauthenticated sessions. The implicit clustering strategy (no cooperation required from the clustered entity) maps onto passive behavioral fingerprinting approaches where the detection agent does not require the suspect session to self-report. Bot-defense engineers should note that the VAM suppression side-effect — a trusted agent causing uncredentialed peers to go silent — has a dual in CAPTCHA contexts where over-aggressive bot mitigation signals can suppress legitimate low-signal users.

More concretely, the FSM-based role establishment protocol (helpStatus negotiation, explicit leader/follower role assignment, RMCM acknowledgment chaining) is a clean example of a lightweight, decentralized challenge-response coordination protocol that achieves deterministic role assignment without a central authority. The sub-150ms negotiation latency achieved on commodity 802.11p hardware suggests that similarly lightweight, cryptographically-anchored role-establishment handshakes could be feasible in high-throughput web contexts. The mCBR reduction results also serve as a reminder that aggregation/clustering of redundant signals from many similar sources (bot farms generating structurally identical requests) can reduce both infrastructure load and detection complexity — a principle directly applicable to deduplication strategies in bot traffic analysis pipelines.

Cite

bibtex

@article{arxiv2605_06662,
  title={ Multi-Robot Coordination in V2X Environments },
  author={ John Pravin Arockiasamy and Alexey Vinel },
  journal={arXiv preprint arXiv:2605.06662},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.06662}
}

Multi-Robot Coordination in V2X Environments ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​