MEMENTO: Leveraging Web as a Learning Signal for Low-Data Domains
Source: arXiv:2605.29795 · Published 2026-05-28 · By Ashutosh Ojha, Vinay Aggarwal, Ashutosh Srivastava, Siddharth Yedlapati, Yaman K Singla, Jitendra Ajmera
TL;DR
MEMENTO addresses the challenge of learning in low-data domains by leveraging the open web itself as a dynamic learning signal rather than treating it as a static retrieval tool. Unlike prior approaches that rely primarily on fixed labeled or pseudo-labeled datasets, MEMENTO enables agents to accumulate both factual domain knowledge and procedural research strategies across repeated interactions with the web. It operates at two levels: within each session, an Adaptive Exploration Tree (AET) iteratively decomposes tasks into sub-questions and reflects on intermediate answers to guide further search; across sessions, dual-channel memory separates declarative facts and procedural research heuristics to enable cross-session learning without model fine-tuning. Evaluated on sales automation and legal research tasks with strict temporal filtering to prevent data leakage, MEMENTO achieves significant performance gains over ReAct baselines, demonstrating that repeated web interaction can bootstrap expertise and more efficient search in data-scarce settings.
The key novelty lies in reflecting on research findings mid-session to adapt search dynamically and accumulating domain expertise spanning both "what" is known and "how" to search effectively. Empirical results show up to +25.6% improvement on sales automation and +36.5% on legal research over web-enabled ReAct agents, highlighting the value of cross-session learning of research strategies in addition to factual knowledge. This contrasts with prior episodic web agents that do not persist knowledge or strategy. MEMENTO's interpretable memory representations also enable human-readable accumulation of evolving domain expertise and search heuristics.
Key findings
- MEMENTO improves sales automation coverage score by +25.6% (0.579 vs 0.461) compared to ReAct + Qwen baseline.
- MEMENTO improves legal research accuracy by +36.5% (0.808 vs 0.592) over ReAct + Qwen baseline on JUSTICE dataset with search cutoff.
- Adaptive Exploration Tree (AET) alone contributes large gains: +19.7% on sales and +29.6% on legal research vs ReAct baseline.
- Cross-session memory adds further gains on top of AET (+4.9% sales, +5.3% legal research with Qwen).
- Strict temporal search cutoffs (6 months for sales, 2 years for legal) prevent direct leakage of ground truth, confirming learned expertise is not memorized.
- MEMENTO requires no model fine-tuning; improvements arise solely from accumulated declarative and procedural memory used as prompt context.
- Performance gains hold across backbone models: Qwen (open source) and GPT-5-mini (frontier LLM).
- MEMENTO operates effectively from only 60 training examples in both low-data domains.
Threat model
The adversary is an agent model operating under low-data supervision lacking labeled examples and direct access to ground-truth outcomes. The model can query the web but only up to a fixed historical cutoff preventing it from retrieving explicit answers. The agent cannot modify its internal weights during learning and must accumulate knowledge through interaction with web content across sessions. Malicious or adversarial web content is not explicitly handled in this threat model.
Methodology — deep read
Threat Model & Assumptions: The adversary is implicit—models lack large labeled data for domain tasks and cannot rely on memorized outcomes due to enforced temporal cutoffs on web search. The agent cannot retrieve ground-truth outcomes directly but can access open web data published prior to the event, simulating a realistic low-data research setting.
Data: Two public professional-domain datasets were used. Sales Automation (SDR-Bench) with 180 samples split into 60 training and 120 test, each sample containing a seller-product-customer triple with associated value propositions. Legal Research (JUSTICE benchmark) with 180 filtered U.S. Supreme Court cases after applying a two-year prior search cutoff, also split 60/120 train/test. Labels are text or classification outcomes. Preprocessing included temporal filtering and partitioning with fixed seeds.
Architecture/Algorithm: MEMENTO comprises a two-level framework:
- Within-session Adaptive Exploration Tree (AET), which decomposes root queries into waves of sub-questions, dynamically adjusting future questions based on reflection over current findings stored in local session memory.
- Sub-questions are solved by tool-augmented agents using actions: SEARCH_MEMORY (from declarative/procedural cross-session memory), SEARCH_WEB, SCRAPE_RESULTS, and GENERATE_ANSWER.
- Bottom-up LLM synthesis aggregates answers from child to parent nodes for a final output.
- Dual-channel persistent cross-session memory:
- Procedural Memory split into Craft Knowledge (free-text research heuristics), Decomposition Rules (conditional question rewrite rules), and Web Action Rules (query and scraping protocols).
- Declarative Memory storing factual domain knowledge accumulated from prior sessions.
Training Regime: Training occurs in batches of size b=60 samples, comprising forward passes with current memory state, unsupervised consolidation (craft knowledge rewriting and web rule updates), scoring with LLM judge vs ground truth, and supervised reflection updating memory rules. No model weights are fine-tuned; all learning occurs by updating explicit memory stores. Experimental hardware details are not specified. Hyperparameters for budgets on question count and search steps govern search complexity.
Evaluation Protocol: Metrics differ by domain:
- Sales Automation is scored by an LLM judge on coverage of ground truth value propositions on a 0–5 scale, averaged over propositions.
- Legal Research is evaluated as accuracy of predicted winning party. Baselines: closed-book LLM, 5-shot in-context learning, ReAct with web search, ReAct + MEMENTO memory, AET only (without memory), and full MEMENTO (AET + memory). Ablations isolate contributions of AET vs memory. Statistical significance or confidence intervals are not reported.
- Reproducibility: The paper mentions open-source backbone LLM (Qwen) and a frontier GPT-5-mini, but code and memory stores are not explicitly declared as released. Datasets are publicly known but filtered versions may not be fully public. The temporal cutoff mechanism and batch update procedure may require substantial replication effort.
Example Walkthrough: Given a sales pitch query for Coca-Cola and Marketo products, MEMENTO starts by decomposing the question into waves (e.g., understanding Coca-Cola's martech budget, KPIs, marketing strategies). It consults procedural memory for crafted decomposition rules; if empty, it bootstraps with web searches. Then, it queries declarative memory before web search. Findings are reflected upon after each wave, generating further refined sub-questions. Final synthesis aggregates answers into a value proposition pitch. The entire trajectory updates cross-session memory to improve future research sessions.
Technical innovations
- Adaptive Exploration Tree (AET) that dynamically decomposes tasks and reflects between waves to adapt question generation and search paths within a session.
- Dual-channel persistent cross-session memory splitting declarative factual knowledge from procedural research strategies, enabling separate accumulation and evolution of domain expertise and search heuristics.
- Unsupervised batch-level consolidation and supervised reflection that update both memory stores from execution trajectories and ground truth without any model fine-tuning.
- Tool-augmented agents leveraging procedural memory first to reduce expensive web calls, iterating with local session memory and web exploration for efficient knowledge acquisition.
Datasets
- SDR-Bench — 180 samples (60 train / 120 test) — publicly available customer success stories for sales automation
- JUSTICE benchmark — 180 filtered U.S. Supreme Court cases (60 train / 120 test) — sourced from Oyez, filtered for temporal leakage
Baselines vs proposed
- ReAct + Qwen: coverage score = 0.461 vs MEMENTO + Qwen: 0.579 (+25.6%) on Sales Automation
- ReAct + Qwen: accuracy = 0.592 vs MEMENTO + Qwen: 0.808 (+36.5%) on Legal Research
- AET + Qwen: coverage score = 0.552 vs ReAct + Qwen: 0.461 (+19.7%) on Sales Automation (isolating AET impact)
- AET + Qwen: accuracy = 0.767 vs ReAct + Qwen: 0.592 (+29.6%) on Legal Research
- MEMENTO + Qwen: coverage score = 0.579 vs AET + Qwen: 0.552 (+4.9%) and accuracy = 0.808 vs 0.767 (+5.3%) on Legal Research (memory addition)
- 5-shot Qwen: 0.375 coverage vs 5-shot Qwen + MEMENTO: 0.382 on Sales Automation (small gains)
- Closed-book Qwen: 0.416 vs ReAct + Qwen: 0.461 (shows some gain from web)
- ReAct + GPT-5-mini: 0.522 coverage vs MEMENTO + GPT-5-mini: 0.547 (+4.8%) on Sales Automation
Figures from the paper
Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.29795.

Fig 1 (page 1).

Fig 1: Overview of MEMENTO for a single training sample. Given a research question, the

Fig 3 (page 4).

Fig 4 (page 4).

Fig 5 (page 4).

Fig 6 (page 4).

Fig 7 (page 4).

Fig 8 (page 4).
Limitations
- The study focuses on two professional domains; generalization to other low-data domains remains to be validated.
- No adversarial or real-world robustness evaluation against noisy or malicious web content was conducted.
- No statistical significance testing reported for improvements; gains may vary with different dataset splits.
- The temporal filtering prevents data leakage but reduces dataset size, potentially limiting training diversity.
- Details on computational cost, latency, and efficiency trade-offs of iterative AET search are not provided.
- Memory accumulation depends on the quality and coverage of initial web trajectories; cold start scenarios may perform worse.
Open questions / follow-ons
- How well does MEMENTO generalize to domains with less structured or noisier web content?
- Can reinforcement learning or other automated optimization further improve procedural memory without labeled data?
- What are the efficiency trade-offs and cost implications of the AET exploration strategy in larger-scale deployments?
- Could MEMENTO incorporate adversarial robustness to misinformation or improve source credibility filtering?
Why it matters for bot defense
For practitioners in bot-defense and CAPTCHA, MEMENTO offers insights into leveraging iterative web feedback loops to improve domain expertise in data-scarce environments without model retraining. The dual-channel memory approach separating facts from procedural heuristics suggests new avenues for persistent, interpretable agent memory, which could apply to designing agents that learn and adapt detection or challenge strategies over time. Dynamic, reflective decomposition of queries via the Adaptive Exploration Tree also informs approaches to multi-step information gathering and reasoning when facing constrained labeled data. Although the paper focuses on professional and research domains, core concepts around persistent, interpretable learning from web trajectories could inspire CAPTCHA systems that adapt challenge difficulty or selection using accumulated cross-session knowledge of adversarial behavior or bot tactics, improving robustness while minimizing continuous retraining costs.
Cite
@article{arxiv2605_29795,
title={ MEMENTO: Leveraging Web as a Learning Signal for Low-Data Domains },
author={ Ashutosh Ojha and Vinay Aggarwal and Ashutosh Srivastava and Siddharth Yedlapati and Yaman K Singla and Jitendra Ajmera },
journal={arXiv preprint arXiv:2605.29795},
year={ 2026 },
url={https://arxiv.org/abs/2605.29795}
}