EQSANS-CLI: A natural-language, agent-ready command-line tool for small-angle neutron scattering data reduction at EQ-SANS

Source: arXiv:2605.00651 · Published 2026-05-01 · By Changwoo Do

TL;DR

EQSANS-CLI addresses a very specific but common pain point at SANS beamlines: data reduction is conceptually routine, but operationally cumbersome because many decisions have to stay consistent across dozens to hundreds of runs, multiple configurations, calibration steps, and stitching choices. The paper’s main move is not to invent a new reduction engine; it builds a new interface and state model around the existing EQ-SANS/drtsans reduction stack so the workflow becomes scriptable, resumable, and addressable by both humans and external AI agents.

The novel pieces are the shared command-handler layer, a persistent working table that makes all reduction decisions explicit row-by-row, and a dual interface where interactive prose and headless JSON both compile to the same executor. The result is an architecture where a single /autopilot invocation can go from IPTS loading through catalog matching, reduction, calibration, stitching, plotting, and sharing, while status tags prevent unnecessary recomputation. The paper also demonstrates that a Slack bot plus one skill document is enough to operate the headless interface over SSH from natural language, without embedding EQ-SANS-specific logic into the agent itself.

Key findings

The system reduces the workflow from an IPTS number to stitched I(Q) curves through a single /autopilot command that chains loading, matching, calibration, reduction, stitching, plotting, and sharing.
The working table stores one row per reduction unit and tracks status values ready, done, modified, and error; changed parameters trigger only modified rows to be re-reduced.
Natural-language input and slash commands are not separate code paths: prose is translated into slash commands and then dispatched through the same shared handler layer as typed commands and headless JSON requests.
The headless protocol is one slash command per line on stdin and one JSON object per line on stdout, with progress on stderr; the CLI reports that this same command core is used by both the TUI and headless engine.
The Slack demo used OpenClaw routed through OpenRouter to Xiaomi MiMo-V2-Pro, with the AGENT SKILL.md file loaded into the agent prompt; the paper reports this setup worked with a local Python test harness, a Slack bot, and an experimental general-purpose agent without modifying EQSANS-CLI.
Configuration identity is encoded compactly from instrument metadata, e.g. 4m10a for 4 m / 10 Å / 60 Hz and 4m2.5a30hz for 4 m / 2.5 Å / 30 Hz, and the same identifier is used in the working table, command arguments, and output filenames.
Smart stitching uses preset overlap ranges when available, and otherwise an auto-overlap algorithm that starts with six centered Q points and widens symmetrically until each curve has at least two points inside; /stitch smart can also remove configurations that add no information.

Threat model

The system assumes an authorized human user or external AI agent operating within the SNS analysis environment, with access to ONCat, the SNS filesystem, and the eqsanscli-headless SSH endpoint. The main misuse cases are incorrect parameter edits, stale reruns after changes, and malformed natural-language requests; the CLI mitigates this by funneling all side effects through shared handlers, tracking row-level status, and blocking destructive shell commands from the natural-language path. It does not attempt to defend against a compromised operator, a malicious agent with valid SSH access, or a hostile host environment; the paper explicitly says the headless mode inherits the user’s existing credentials and trusts the remote operator.

Methodology — deep read

Threat model and assumptions: the paper is a workflow/infrastructure paper rather than a learning-systems paper, so the relevant adversary is an operator (human or external agent) who may issue incorrect, incomplete, or stale reduction commands, or resume after an interruption with partially updated state. The design assumes the user/agent is authorized to access the SNS analysis environment over SSH and that the CLI is the trusted executor. The paper explicitly restricts the natural-language route from emitting destructive shell commands such as /sh, /rm, and /mv, and it treats the CLI as authoritative: the agent may translate intent into commands, but it cannot directly invoke reduction functions or mutate the working table. The paper also notes that the headless SSH mode inherits existing user credentials and therefore trusts the operator at the other end of the connection; this is a convenience/security trade-off, not a hardened remote-attestation design.

Data and state representation: the system operates on EQ-SANS facility metadata and raw data already present in the SNS ecosystem rather than on a new benchmark dataset. Catalog data are loaded from ONCat for a given IPTS number, where runs are classified into scattering, transmission, background scattering, background transmission, empty transmission, and empty scattering classes. The working table is then constructed from the catalog after /matchruns, with one row per reduction unit and columns for sample name, configuration, scattering run, matched transmission, matched background, matched background-transmission, matched empty-beam, sample thickness, and status. The paper gives an example table slice with 89 scattering runs across 2 configurations (2.5m2.5a and 4m10a), and another excerpt showing rows 38–41 with concrete run numbers and statuses ready/done/modified. There is no supervised train/test split because this is not an ML model training paper; the “data” are operational reduction records, catalog metadata, and file outputs. Preprocessing happens through classification heuristics on run titles: title prefixes like S- and T- are combined with keyword matching for background/empty markers, with order-sensitive rules so that terms such as banjo, emptyticell, or emptycell take precedence over generic empty-beam keywords.

Architecture and algorithm: EQSANS-CLI has two binaries, eqsanscli for the interactive terminal UI and eqsanscli-headless for programmatic execution. Both route state-changing operations into a shared command-handler layer; the TUI additionally sends free-form prose to a natural-language router, which builds a prompt from curated domain knowledge (preset configs/knowledge.md), current session context, and recent history, then asks an LLM via OpenRouter to emit a sequence of slash commands. Those commands are parsed and dispatched through the same handlers. Handlers are organized by concern (catalog, matching, config, reduction, calibration, stitch, session, shell) and call pure-logic services that operate on the working table and integration wrappers. External systems include ONCat, the SNS filesystem, drtsans as a subprocess-based reduction engine, and a share endpoint for time-limited URLs. The configuration system encodes instrument settings into identifiers like 4m10a and stores per-configuration JSON reduction parameters in the drtsans eqsans reduction.json schema; /apply preset matches active configurations to curated presets, while /set config exposes only the subset of parameters that typically vary. The pipeline is designed to be idempotent: when a row’s relevant parameter changes, its status flips to modified and only that row is re-reduced on the next /reduce or /autopilot.

Training or operational regime: there is no model training in the usual sense, because the LLM is only used as a translator and the paper does not fine-tune it. Operationally, the system runs interactively in a terminal or headlessly over SSH on the SNS analysis cluster. The headless protocol is minimal: one slash command per line on stdin, one JSON object per line on stdout with fields success, message, and data, and progress lines on stderr prefixed with progress:. Session state, working table, catalog, and command history are saved after each command and on exit; /continue restores them. The paper does not report epochs, batches, seeds, or optimizer settings because those do not apply to the core contribution. For the agent demo, the setup uses OpenClaw as the outer agent framework, OpenRouter as the LLM gateway, and Xiaomi MiMo-V2-Pro as the model; again, no training loop is described.

Evaluation protocol and concrete example: evaluation is qualitative and system-level. The paper demonstrates end-to-end operation in three environments: the interactive terminal, a headless JSON backend, and a Slack-resident agent. A typical human workflow is shown as /load ipts 38397, inspection and possible /reclass edits, then /matchruns, which reports a summary such as “Matched 89 scattering runs across 2 configurations. Configurations: 2.5m2.5a, 4m10a. Transmission matched: 89/89. Background matched: 89/89. Empty beam matched: 89/89.” Calibration is then done by reducing a porous silica standard with standardabsolutescale = 1, calling /calibrate <porsil file> --applynow to compute the absolute scale factor over a reference Q-range, and letting that change propagate to affected rows via modified status. Stitching is tested with /stitch build and /stitch smart, where preset overlap windows are used when available (e.g. 4m10a with 2.5m2.5a uses [0.05, 0.06] Å−1) and the fallback auto-overlap algorithm widens from six centered Q points until each curve has at least two points. The agent demo is a concrete end-to-end scenario: the user asks Slack to create a directory, set outputdir, reduce only 70°C data from the 4m10a config using emptycell as background, and show the plan before executing; the bot returns a numbered plan of slash commands, waits for confirmation, then executes and returns progress. The paper does not report quantitative success rates, latency, or human-study metrics, and it does not present a statistical comparison against the older script-based workflow.

Reproducibility: the source code is reported as available at https://github.com/cw-do/eqsanscli. The paper also mentions the AGENT SKILL.md file as the sole agent-side integration artifact, and the headless interface is described precisely enough to reconstruct the protocol. However, the dataset is not a public benchmark; the concrete runs, IPTS examples, and facility files are internal to SNS/ONCat, and the paper does not provide frozen outputs, release artifacts, or a reproducibility checklist. Because the paper is primarily an architecture/workflow contribution, the strongest reproducibility signal is the stable command contract rather than a fixed experimental corpus.

Technical innovations

A single authoritative command-handler layer is reused for typed commands, natural-language translation, and headless JSON requests, eliminating separate execution paths.
The working table makes reduction state explicit and editable at row granularity, turning hidden script variables into persistent, inspectable system state.
Status-driven re-reduction treats parameter changes as first-class events so that only affected rows rerun after edits or resume points.
Smart stitching combines preset overlap windows with a centered auto-overlap fallback and optional LLM-suggested edits that still require confirmation.
The agent integration pattern is deliberately thin: a stable slash-command contract plus one skill file is enough for an external chat agent to operate the CLI over SSH.

Datasets

ONCat run catalog — facility metadata for an IPTS; exact size not specified — ONCat / SNS internal metadata service
EQ-SANS raw NeXus event data — not specified; facility internal — /SNS/EQSANS/... on the SNS analysis cluster
Porous silica standard reductions — not specified; facility internal standard data — SNS filesystem

Baselines vs proposed

Legacy standalone Python scripts: not quantitatively compared in the paper; proposed system claims reduced coordination burden but no numeric delta is reported
Manual multi-step terminal workflow: not quantitatively compared; proposed /autopilot collapses load → match → reduce → stitch/share into one invocation
Headless agent vs interactive terminal: same command core and same handlers; no separate accuracy metric reported
Preset stitching vs auto-overlap fallback: example preset [0.05, 0.06] Å−1 for 4m10a + 2.5m2.5a and [0.025, 0.028] Å−1 for 8m12a + 4m10a; no numeric quality metric reported

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.00651.

Fig 2

Fig 2: Interactive terminal interface of EQSANS-CLI. The interface displays the current session

Fig 4

Fig 4: Setting up the Slack-resident agent for EQSANS-CLI. The user instructs the OpenClaw

Fig 5

Fig 5: Loading catalogue information from the desired IPTS.

Fig 6

Fig 6: Asking AI agent to reduce data with specific workflow instructions.

Limitations

The paper reports no quantitative user study, latency, error-rate, or throughput measurements versus the prior script-based workflow.
The Slack/agent demonstration is a proof of feasibility, not a benchmark of reliability, success rate, or adversarial robustness.
The headless SSH mode explicitly trusts the operator at the remote end; there is no stronger authentication or policy layer described.
Natural-language translation depends on curated knowledge.md and an LLM provider; behavior on out-of-distribution phrasing is not systematically evaluated.
Stitch overlap selection is partly heuristic and partly preset-based; the paper does not report failure cases or validation against ground truth overlaps.
The dataset and facility environment are internal to SNS/ONCat, which limits external reproducibility without access to the same infrastructure.

Open questions / follow-ons

How well does the natural-language router generalize to ambiguous, incomplete, or conflicting reduction requests that are not covered by knowledge.md?
Can the command contract be extended to stronger safety properties, such as policy checks or typed validation, without losing the current simplicity?
What is the empirical error rate of automatic run classification and matching on real IPTS datasets, especially for messy titles and temperature-series experiments?
Can the same agent-ready pattern be applied to downstream analysis steps, and where do human approval boundaries need to sit for those steps?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, the paper is interesting less as a SANS tool and more as an example of how to make a complex workflow machine-addressable without giving the agent direct access to internal logic. The architectural pattern is: keep the executor deterministic and auditable, expose a narrow command vocabulary, persist state in a structured table, and let the LLM only translate intent into commands. That is directly relevant to building agent-facing anti-abuse tooling, where you want humans, scripts, and agents to hit the same policy-enforced path rather than parallel code paths with different trust guarantees.

From a defensive perspective, the design also shows the limits of “agent-ready” interfaces: once an agent can issue valid commands over a stable contract, security shifts to authentication, authorization, and command validation, not prompt engineering. The paper’s explicit restriction on destructive shell commands and its use of confirmation for smart-stitch proposals are small but important examples of guardrails that would map cleanly to bot-defense systems, moderation consoles, or CAPTCHA-adjacent review workflows. The Slack demo is also a reminder that if your workflow already lives in a chat channel, agents will naturally enter through that channel unless you provide a safer, narrower API first.

Cite

bibtex

@article{arxiv2605_00651,
  title={ EQSANS-CLI: A natural-language, agent-ready command-line tool for small-angle neutron scattering data reduction at EQ-SANS },
  author={ Changwoo Do },
  journal={arXiv preprint arXiv:2605.00651},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.00651}
}

EQSANS-CLI: A natural-language, agent-ready command-line tool for small-angle neutron scattering data reduction at EQ-SANS ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​