A Query Engine for the Agents

Source: arXiv:2605.27785 · Published 2026-05-27 · By Kenny Daniel

TL;DR

This paper addresses the challenge of querying and analyzing the fastest-growing form of production data today: large volumes of unstructured text such as agent traces, chat logs, and model outputs. Traditional SQL engines and lakehouse architectures are ill-suited for this workload, especially within client-side JavaScript runtimes where AI applications often run with both a human user and an LLM agent in the same process. The authors present Hyperparam, a lightweight stack of three open-source JavaScript libraries (Hyparquet, Squirreling, Icebird) totaling under 70 KB, designed to run natively in-browser or in agent sandboxes. Hyperparam directly reads Parquet and Iceberg tables from object storage and supports interleaving classic analytical operators with asynchronous model-based UDFs (e.g., calls to llm()), firing only when demanded downstream. Squirreling, their async SQL engine, achieves over 300× speedups compared to the popular DuckDB-WASM on filter-bounded queries, drastically reducing inference costs in agent workloads. The system enables agent applications to interactively explore and reason over massive unstructured data without costly backend services or infrastructure.

Key findings

Hyparquet, Squirreling, and Icebird together have a total bundle size under 70 KB gzipped, compared to DuckDB-WASM’s 8 MB+ gzipped bundle.
Time-to-first-row on a 40 GB dataset cold start is 0.6s for Hyperparam vs 19s for DuckDB-WASM, and 0.2s warm vs 1.3s for DuckDB-WASM (Table 1).
Squirreling runs LLM-shaped async UDF queries 300× faster than DuckDB-WASM on filter-bounded queries and 192× faster on sort-bounded queries (Table 2).
In an agent benchmark, Squirreling completes a 10-task analyst query suite at about one-third the cost of DuckDB-WASM, reducing input tokens billed by 3.67× (Table 3).
Per-cell lazy execution avoids unnecessary inference calls: a ‘LIMIT N’ query fires exactly N llm() calls regardless of underlying row counts.
Async pipeline concurrency in Squirreling allows up to 256 concurrent UDF invocations, whereas DuckDB-WASM serializes calls synchronously.
Icebird supports client-side resolution of Iceberg table snapshots and deletes, without backend mediation, using custom credential hooks.
Squirreling provides a permissive SQL parser and heterogeneous backend joins tailored to agent and human shared runtime environments.

Threat model

The system assumes trustworthy client-side environments hosting both human users and agents, with no explicit adversarial threat modeled. The primary challenge is minimizing costly LLM inference calls and query latency rather than preventing active attacks. Data resides in object storage accessible with proper credentials, and queries run client-side without backend mediation to avoid latency and billing overheads.

Methodology — deep read

The paper's focus is on enabling fast, interactive querying of large-scale unstructured text data accessed from object storage by agents and humans co-hosted in client-side JavaScript runtimes such as browser tabs or sandboxed Node contexts.

Threat Model and Assumptions: The adversary is not explicitly modeled, but the system assumes trust in client-side JavaScript runtimes and a threat landscape focused on cost and latency in inference calls. The goal is to minimize inference billing costs and query latency by tightly coupling data access and LLM-based interpretation without backend intermediaries. The data resides in large-scale Parquet files and Iceberg tables stored remotely in object stores (S3, GCS, Azure).

Data: The benchmarks use a 40 GB Iceberg table of agent traces on S3 and a 50,000-row Parquet corpus of agent-trace logs for practical agent-style question answering. The data is unlabeled unstructured text representing chat logs, reasoning chains, and tool use traces. Preprocessing includes Parquet footers and Iceberg manifests for metadata access.

Architecture: The system comprises three composable JavaScript libraries: Hyparquet (14 KB) reads Parquet column chunks with HTTP range requests and zero dependencies; Icebird (32 KB) resolves Iceberg snapshots and applies deletes client-side; and Squirreling (22 KB), a custom async-native JavaScript SQL engine with lazy deferred cells as the unit of computation. Squirreling’s architecture supports SQL queries with asynchronous UDFs like llm(), executed per cell lazily only when demanded downstream. The SQL dialect is permissive to allow model-driven SQL generation from agents. Queries compose operators as AsyncGenerators yielding rows as they are ready, enabling streaming and concurrent evaluation without thread pools.

Training Regime: N/A, as this is a system/library paper focusing on engineering and benchmarks rather than ML model training. The proxy llm() UDF used in benchmarks simulates a 5 ms inference latency for cost isolation.

Evaluation Protocol: Benchmarks compare Hyperparam components to DuckDB-WASM 1.33.1-dev45.0 on filter-bounded, sort-bounded, and limit queries over a 10,000 row Parquet corpus. Metrics include wall-clock latency, UDF call count, cold start time to first row, and cost measured by Anthropic API token billing during multi-round agent queries on a 50,000 row dataset. Ablations examine lazy per-cell execution versus bulk synchronous calls, streaming AsyncGenerator pipelining, and bundle size impact on cold start.

Reproducibility: All three libraries (Hyparquet, Squirreling, Icebird) are open-source JavaScript libraries with live demos referencing public multi-gigabyte Parquet and Iceberg datasets. Benchmarks are detailed with versions and datasets, but full scripts are not included in the paper. The datasets are publicly accessible or through object storage with credentials.

End-to-end Example: For a query such as SELECT llm('classify', content) FROM traces WHERE session_id = $id LIMIT 5, Hyparquet fetches only the Parquet chunks needed for the session_id predicate, Squirreling evaluates predicates lazily, decompresses only necessary column cells, and issues exactly five asynchronous llm() inference calls streaming results back as they complete. This contrasts with previous synchronous vectorized engines forcing all rows to be materialized, causing excessive costs and latency.

Technical innovations

Per-cell lazy evaluation for SQL queries integrating asynchronous LLM UDF calls, firing inference only on demanded output rows.
A fully client-side, minimal JavaScript stack combining Parquet/Iceberg direct object storage reads with async-native SQL query execution.
Streaming SQL operators implemented as JavaScript AsyncGenerators for concurrent, non-blocking query pipelines with pluggable backends.
A 100× smaller bundle (~70 KB vs 8 MB+) enabling fast cold start in browsers and agent sandboxes, critical for ephemeral AI data applications.

Datasets

40 GB Iceberg agent traces dataset — stored on S3 — used for time-to-first-row benchmarks.
50,000-row agent-trace Parquet corpus — public or proprietary usage — used for agent question answering benchmarks.
10,000-row Parquet corpus — used for async UDF latency benchmarks.

Baselines vs proposed

DuckDB-WASM: cold start time = 19 s vs Hyperparam: 0.6 s on 40 GB Iceberg table first row fetch (Table 1).
DuckDB-WASM: warm start time = 1.3 s vs Hyperparam: 0.2 s on 40 GB Iceberg table (Table 1).
DuckDB-WASM: 12,620 ms on filter-bounded queries vs Squirreling: 40 ms (300× speedup) with 606 vs 2,048 UDF calls (Table 2).
DuckDB-WASM: 61,667 ms on sort-bounded queries vs Squirreling: 321 ms (192× speedup) with ~10,000 calls each (Table 2).
DuckDB-WASM agent cost per 10-task pass: $0.203 vs Squirreling: $0.067 (67% cost reduction), both 50/50 correct (Table 3).

Limitations

The system does not address workloads requiring multi-terabyte joins or aggregations exceeding local RAM; large-scale joins still require cluster backends.
Per-cell laziness bounds inference cost for LIMIT queries but cannot prevent runaway costs for unbounded SELECT llm(...) queries.
Authentication is a concern for private object stores; best practices differ across host runtimes (browser tokens, OS keychain, shell creds).
No adversarial security evaluation or robustness tests against malicious or corrupted input data were presented.
Agent benchmarks use a mock llm() UDF with artificial 5 ms latency rather than real-world LLM API latency or error modes.
SQL dialect permissiveness and model-generated queries require robust error handling; non-trivial steerability is needed but remains an open UX area.

Open questions / follow-ons

How to design effective UX and rate limiting mechanisms to prevent runaway costs on unlimited LLM UDF queries?
Can the lazy async cell execution approach be extended to SQL features requiring global buffering such as sorting and grouping without large performance penalties?
What best practices and protocols can secure credential management for private object storage across diverse client runtimes?
How will Squirreling’s permissive SQL dialect interact with end-user trust and debugging workflows for model-generated queries in production?

Why it matters for bot defense

This work is highly relevant for bot-defense and CAPTCHA practitioners building AI-powered agent interfaces requiring on-demand, client-side querying of large unstructured logs and traces without backend delays. The per-cell lazy evaluation model offers a principled approach to bounding inference calls by demand, enabling cost-effective and responsive LLM-based analysis in untrusted or sandboxed environments.

Its small JavaScript bundle and zero dependency design serve as an example for integrating lightweight query engines natively in browser tabs or agent sandboxes, reducing attack surfaces and dependency complexity. Bot-defense systems often analyze large volumes of logs and user interactions; adopting or adapting such a native async query engine could improve throughput and user experience by tightly coupling inference with data access without backend round-trips or excessive inference costs.

Cite

bibtex

@article{arxiv2605_27785,
  title={ A Query Engine for the Agents },
  author={ Kenny Daniel },
  journal={arXiv preprint arXiv:2605.27785},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.27785}
}

A Query Engine for the Agents ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​