DarkAgents
Source: arXiv:2606.11157 · Published 2026-06-09 · By Michele Lucente, Silvia Pascoli, Filippo Sala, Matteo Zandi
TL;DR
DarkAgents is a pioneering multi-agent system designed specifically for theoretical astroparticle physics (TAP), addressing its unique challenges such as multi-disciplinary model building, complex pipeline computations, and auditing of assumptions that impact scientific validity. Unlike prior AI-assisted scientific workflows focused on collider physics or cosmology, DarkAgents combines the reasoning and code-generation capabilities of large language models (LLMs) with deterministic, human-written backend code to create modular, auditable, and LLM-agnostic research pipelines. The first implementation, DarkAgent-PT, focuses on cosmological first-order phase transitions (FOPT), analyzing classically scale-invariant particle physics models against nanohertz gravitational wave data from the NANOGrav PTA experiment. DarkAgent-PT autonomously reproduces posterior parameter distributions consistent with human research, identifies relevant experimental and astrophysical constraints, and audits implicit and explicit assumptions throughout the workflow. The system also discovered inconsistencies in existing fit results and generated novel parameter fits with dissipative bulk flow gravitational wave templates, demonstrating the real scientific value of agentic workflows in TAP.
The authors validated DarkAgents by comparing Bayesian posterior distributions from the agentic pipeline against human-executed analyses using the same backend. They tested robustness across multiple state-of-the-art LLMs including Anthropic's Claude Code and OpenAI's Codex, finding that more capable models could produce near-autonomous end-to-end runs, while less capable ones required more human oversight. The framework’s modular design allows flexible extension to other TAP problems, enabling community-driven development of specialized DarkAgents addressing different astrophysical and particle physics challenges. The code and extensive examples have been made publicly available, underscoring the goal of fostering sustainable, reproducible, and human-auditable AI-assisted science in TAP.
Key findings
- DarkAgent-PT reproduces Bayesian posterior distributions on NANOGrav 15yr gravitational wave data consistent with traditional human analysis (Fig. 2 comparison).
- State-of-the-art LLMs (e.g., Claude Code Opus 4.8, Codex GPT-5.5) can autonomously complete the full DarkAgent-PT workflow with minimal human guidance.
- Less capable LLMs (e.g., Mistral Vibe) require stronger human guidance and are less reliable identifying constraints or implicit assumptions.
- DarkAgent-PT correctly identifies and rejects the sound-wave gravitational wave spectrum template outside its regime of validity in favor of the dissipative bulk-flow template.
- The constraint-sub-agent discovers relevant particle physics, cosmological (ΔNeff), and astrophysical constraints (e.g., SN1987A cooling, beam-dump experiments), while flagging missing constraint coverage (e.g., kinetic and scalar mixing) and avoiding hallucinations.
- The prior-sub-agent audits model and pipeline assumptions, diagnosing limitations such as renormalization scale choice, daisy resummation, and bubble wall velocity calculation absent from the backend.
- DarkAgents halts safely when incompatible models are given, preventing silent failure by stopping and reporting incompatibility rather than forcing invalid workflows.
- Hallucinated references occasionally appear in generated reports, indicating the need for cautious human supervision of literature citation.
Threat model
The adversary is the intrinsic stochasticity and hallucination-prone behavior of large language models, which could produce incorrect intermediate or final scientific results. The system assumes a cooperating user who provides prompts and reviews reports, but requires safeguards to detect hallucinations, silent failures, and incompatibilities, ensuring reliability through modular deterministic backends and human audit points. There is no adversarial attacker actively trying to subvert results or manipulate the models.
Methodology — deep read
DarkAgents addresses the TAP problem of automating complex theoretical workflows by orchestrating multiple AI sub-agents alongside deterministic human-coded physics backends. The threat model is a scientific researcher leveraging AI assistance while requiring reliability, auditability, and mitigation of AI hallucinations; the adversary is thus the risk of LLM-generated errors or silent failures affecting scientific correctness.
Input data includes user prompts describing a particle physics model or a rough idea. Data sources for constraints and literature come from public experimental, cosmological, and astrophysical results. The system supports multiple LLM providers via agentic command-line tools: Anthropic's Claude Code, OpenAI's Codex, Mistral's models, or local models via Ollama. Each sub-agent is injected with detailed instructions in Markdown describing scope, workflow, input/output files, and rules to minimize hallucination.
The orchestrator coordinates sub-agents - proposal, librarian (literature review), critic (model consistency check), FOPT (phase transition computation), PTA (pulsa timing array data fit), constraint (experimental and observational limits), prior (assumption audit), and report - executing them sequentially with user interaction possible between stages. Outputs include markdown and structured JSON files for downstream handoff, ensuring traceability.
The FOPT-PTA pipeline relies on an efficient semi-analytic code [14] to compute the effective potential, phase transition parameters (temperature, strength, inverse duration, bubble velocity), then fits gravitational wave spectrum templates (e.g., dissipative bulk flow) to NANOGrav data using PTArcade with MCMC Bayesian inference. The system uses dimensional analysis and literature priors to optimize parameter space exploration.
Training per se is not applicable, as LLMs are used via APIs or local models without fine-tuning. Instead, carefully engineered prompts and sub-agent instruction files control behavior. Evaluation consists of repeated runs across LLMs and multiple seeds to measure stability and correctness, contrasting posterior distributions with independent human analyses to confirm fidelity.
Reproducibility is supported by open source code and detailed example runs including prompt sets, generated scripts, reports, JSON handoffs, and figures at https://github.com/PhysicsZandi/DarkAgents. However, some literature hallucinations remain and require human oversight. The architecture is modular, allowing new pipelines and branches to be added without re-engineering, facilitating community expansion.
Technical innovations
- End-to-end multi-agent architecture combining LLM reasoning with deterministic, tested human-coded physics backends for TAP research.
- Audit sub-agents explicitly identify and report implicit and explicit assumptions and priors affecting physical results, unique to the TAP domain.
- LLM-agnostic design using agentic command-line tools enables interoperability with Claude, OpenAI, Mistral, and local LLMs seamlessly.
- Human-in-the-loop orchestrator pauses workflow after each sub-agent allowing expert interaction or autonomous runs, balancing control and efficiency.
- Flexible modular pipeline structure supports integration of diverse physics tools and iterative iterative refinement mimicking human research workflows.
Datasets
- NANOGrav 15yr dataset — public PTA gravitational wave data — see arXiv:2306.16213
- Literature and experimental constraints from collider physics, astrophysics, cosmology — aggregated from public sources
Baselines vs proposed
- Human expert analysis: Bayesian posterior distribution on NANOGrav data — matches DarkAgent-PT with minimal deviation (Fig. 2)
- Claude Code (Opus 4.8) LLM: completes full pipeline autonomously in multiple runs — reliable results
- Codex (GPT-5.5) LLM: similar autonomous pipeline completion and physical correctness
- Mistral Vibe (mistral-medium-3.5) LLM: requires strong human supervision, less reliable recognizing constraints
Figures from the paper
Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2606.11157.

Fig 1: Architecture of DarkAgents. An orchestrator organises and coordi-
Limitations
- Current implementation handles only single pipeline branch (FOPT gravitational wave analysis); others planned but not released.
- LLM hallucinations occasionally produce fabricated literature citations requiring human vetting.
- Certain constraint classes (e.g., kinetic and scalar mixing bounds) not fully implemented; system flags but cannot apply them.
- Reduced reliability with less capable LLMs necessitates continued human supervision and correction.
- No adversarial robustness evaluation or simulated attacks on the agentic workflow reported.
- No formal statistical significance tests or cross-validation beyond multiple independent LLM runs described.
Open questions / follow-ons
- How to extend DarkAgents to support multiple interconnected TAP problems and enable cross-branch communication among different DarkAgents?
- Can future versions reduce hallucinations further or automatically flag/add missing phenomenological analyses without human intervention?
- What are effective methods to integrate new physics constraints (e.g., kinetic mixing) dynamically with algorithmic computation of relevant quantities?
- How can the architecture adapt to future more powerful LLMs to decrease human-in-the-loop while maintaining high reliability?
Why it matters for bot defense
DarkAgents demonstrates a sophisticated multi-agent AI orchestration framework integrating LLM reasoning with deterministic, auditable code to produce scientifically reliable results. For bot-defense and CAPTCHA practitioners, this work illustrates practical strategies to mitigate hallucination risks when deploying LLMs for critical pipelines, including modular design, stepwise human audit, and backend validation. The system's LLM-agnostic setup offers insight into flexible agent orchestration that could inspire robust CAPTCHA or bot-detection systems harnessing multiple AI tools.
Additionally, the approach of automatically identifying and auditing implicit priors and assumptions could inspire better transparency and trustworthiness assessments in security-critical AI applications. However, the domain-specific nature of the physics pipelines means direct application requires careful adaptation, though the principles of controlled multi-agent interaction, assumption auditing, and human-in-the-loop supervision remain highly relevant for securing complex LLM-powered workflows.
Cite
@article{arxiv2606_11157,
title={ DarkAgents },
author={ Michele Lucente and Silvia Pascoli and Filippo Sala and Matteo Zandi },
journal={arXiv preprint arXiv:2606.11157},
year={ 2026 },
url={https://arxiv.org/abs/2606.11157}
}