Skip to content

What If Prompt Injection Never Left? Exploring Cross-Session Stored Prompt Injection in Agentic Systems

Source: arXiv:2606.04425 · Published 2026-06-03 · By Yuanbo Xie, Tianyun Liu, Yingjie Zhang, Suchen Liu, Yulin Li, Liya Su et al.

TL;DR

This paper studies a novel security vulnerability in stateful agentic systems that extend large language models (LLMs) beyond session-bounded assistants into systems persisting context across multiple sessions. It identifies cross-session stored prompt injection (SPI) as a new class of attacks where adversarial instructions injected once into persistent system state — such as memories, filesystems, or tool descriptions — silently influence behavior in subsequent sessions. Unlike prior prompt injection work focused on ephemeral, session-local attacks, SPI causes persistent system-level compromise through contamination of long-lived agent context, decoupling injection and exploitation over time. The authors formalize SPI, offer a detailed taxonomy of how adversarial content can cross system boundaries, persist in different storage channels, be reintroduced via distinct incorporation mechanisms, and cause harms including fact manipulation, preference bias, and action misdirection. They develop a sandbox and benchmark suite enabling systematic evaluation of SPI risks across models, attack goals, and persistence routes. Quantitative results demonstrate that SPI broadens prompt injection from a fleeting model-level threat into a durable system vulnerability embedded in agent execution state. The paper highlights the need for secure context management as a core design principle for building trustworthy persistent agents, drawing attention to this new frontier of agent security concerns.

Key findings

  • Stored prompt injection (SPI) attacks persist by writing adversarial content into persistent context channels that survive beyond a single interaction session.
  • SPI attacks exploit two stages: unsafe persistent writes of attacker content and downstream reactivation by context incorporation mechanisms.
  • Directly loaded persistent context files like AGENTS.md or tool interface context present the strongest persistence and highest exploitability for SPI due to unconditional incorporation.
  • Conditionally persistent contexts, such as archival memories or workspace files, require agent retrieval to activate, allowing dormant adversarial content.
  • SPI enables temporally decoupled attacks where injection and exploitation occur in different sessions, users, or tasks — resembling stored XSS in web systems.
  • Persistence transforms prompt injection from an ephemeral threat into a system-level risk embedded in the execution environment state.
  • The authors provide the first taxonomy unifying injection sources (user input, external content, supply-chain tools), persistence channels (memory, tools, files), and incorporation methods (direct vs conditional).
  • Their benchmark and sandbox toolkit quantitatively evaluate SPI attack success across different target LLMs, attack goals, and persistence mechanisms.

Threat model

The adversary is an external untrusted user or attacker who can issue inputs crossing the agentic system boundary with the capability to write malicious content into persistent context channels such as memories, files, or tool metadata. The adversary cannot directly control the model M’s internal prompt construction or execution logic, nor rewrite run-time context arbitrarily. They rely solely on their ability to poison long-lived system state that the context construction pipeline incorporates in future executions, enabling temporally decoupled exploitation without requiring simultaneous interaction.

Methodology — deep read

  1. Threat Model & Assumptions: The authors consider adversaries who interact via external untrusted inputs crossing the system boundary. These adversaries possess a 'write primitive' to inject malicious instructions into persistent context channels but lack privileged access to internal components like the model, orchestration logic, or direct context construction steps. They cannot arbitrarily modify execution context at run-time but rely on contaminating persistent state that the context assembler later incorporates naturally.

  2. Data: No external data collection is described; rather, the study develops a taxonomy and constructs a benchmark and sandbox toolkit for testing various SPI attacks across different agentic system designs and levels of persistence. This test environment simulates real-world persistence channels such as working memory, archival memory, tool description context, and file-backed system files (AGENTS.md, USER.md, etc.) to evaluate vulnerability patterns.

  3. Architecture / Algorithm: The paper models the agent as a context construction pipeline where the language model input is assembled from six channel types: current user query, system instructions, session history, tool interface context, retrieved context, and file-backed context. These channels differ in persistence and loading behavior: ephemeral (session-bound), conditionally persistent (retrieval gated), and strongly persistent (always loaded). SPI vulnerability arises from adversarial content being written into persistent context and later incorporated via 'direct loading' (unconditional inclusion) or 'conditional loading' (retrieval or tool invocation). This abstraction enables systematic enumeration of injection vectors and persistence mechanisms.

  4. Training Regime: As a security and threat modeling study, there is no machine learning training involved. Instead, the authors implement adversarial injection tactics and evaluate their activation across agentic system variants. They varied factors including persistence channel type, incorporation mode, and agent model to measure exploitability.

  5. Evaluation Protocol: Evaluation uses the constructed benchmark and sandbox to test attack success rates across multiple attack goals (fact manipulation, preference manipulation, action control). They consider different injection sources and persistence vectors with both direct and conditional loading. They measure whether adversarial content successfully contaminates persistent state and leads the agent to deviate from intended behavior in a later session with no adversary interaction. This end-to-end testing validates the temporal decoupling from injection to activation and quantifies security risk expansion.

  6. Reproducibility: The authors release their benchmark and sandbox code to enable future work (anonymous 4open science repository). The paper discusses the abstract system model and provides enough detail to replicate the taxonomy and testing methods in custom agentic architectures. Exact datasets are not involved since the focus is on attack method generalization rather than data-driven modeling.

Technical innovations

  • Formalization of cross-session stored prompt injection (SPI) as a system-level vulnerability in persistent agentic systems, distinct from traditional session-bound prompt injection.
  • Development of a detailed taxonomy decomposing SPI along three orthogonal dimensions: injection source, persistent context channel, and context incorporation mechanism.
  • Introduction of a benchmark and sandbox toolkit enabling systematic, quantitative evaluation of SPI attack success across varied models, attack goals, and persistence channels.
  • Identification of the critical security risks as emerging from the lifecycle of persistent context management: unsafe writes combined with downstream reactivation, rather than solely from model prompt parsing.

Baselines vs proposed

  • Prior prompt injection research: evaluated mainly as single-session attacks with immediate effect, no cross-session persistence.
  • Proposed SPI evaluation: demonstrates persistent adversarial context can survive across multiple sessions, reactivated under different users or contexts, substantially broadening exploitability (quantitative metrics not fully detailed in excerpt).

Limitations

  • The evaluation is conceptual and benchmark-driven rather than measured on large-scale real agentic deployments or with real-world adversaries.
  • No exploration of sophisticated adversarial defenses or comprehensive mitigation strategies beyond suggesting secure persistent state management.
  • Limited coverage of complex multi-agent or distributed scenarios where cross-system contamination may be more challenging.
  • The sandbox and benchmark code availability is anonymous at present, which may affect immediate reproducibility outside the authors' group.
  • Attack impact quantification is discussed generally with some examples, but exact large-scale empirical data on attack success rates or false positives is not detailed.

Open questions / follow-ons

  • What system design principles and enforcement mechanisms can effectively prevent unsafe persistent writes or malicious downstream reactivation in agentic systems?
  • How can detection or sanitization of adversarial content in persistent context be achieved robustly without compromising agent usability or memory utility?
  • What impact does distributed or federated persistence of context have on SPI risks and mitigation complexity?
  • How do variations in retrieval heuristics, file loading policies, and tool orchestration affect the exploitability and detectability of SPI attacks?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, this work highlights that emerging stateful agentic systems expand the attack surface beyond immediate sessions into persistent context contamination. Defenses must shift from solely scrutinizing immediate inputs to also safeguarding persistence mechanisms—memories, files, tools—that agentic LLMs utilize across sessions. Practitioners should consider design strategies that rigorously control write access to persistent state and monitor unexpected reactivation of stored adversarial instructions. The taxonomy and benchmark provided by this paper offer a framework to systematically evaluate long-term system-level prompt injection risks beyond classic single interaction prompt attacks. This is critical as agentic systems increasingly mediate automated workflows and user interactions across time, raising complex challenges for maintaining aligned, trustworthy LLM behaviors in the presence of cross-session contamination vectors.

Cite

bibtex
@article{arxiv2606_04425,
  title={ What If Prompt Injection Never Left? Exploring Cross-Session Stored Prompt Injection in Agentic Systems },
  author={ Yuanbo Xie and Tianyun Liu and Yingjie Zhang and Suchen Liu and Yulin Li and Liya Su and Tingwen Liu },
  journal={arXiv preprint arXiv:2606.04425},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.04425}
}

Read the full paper

Last updated:

Articles are CC BY 4.0 — feel free to quote with attribution