KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing

Source: arXiv:2606.17034 · Published 2026-06-15 · By Mufei Li, Shikun Liu, Dongqi Fu, Haoyu Wang, Yinglong Xia, Hong Li et al.

TL;DR

KVEraser addresses the computational challenge of post-hoc localized context erasing in long-context large language model (LLM) inference. When a span within cached KV states is identified as erroneous, stale, or harmful only after prefill, exact erasing requires recomputing the entire suffix cache, which incurs latency growing quadratically with suffix length. KVEraser instead learns to generate a surrogate KV cache that locally replaces the KV states of only the erased span with learned steering states, reusing the remaining cache unchanged. This transforms the expensive suffix-dependent recomputation into a span-dependent local edit. The authors introduce a two-stage training pipeline combining generic span-neighbor pretraining and task-specific fine-tuning to learn a transferable eraser module. Experiments on synthetic long-context benchmarks (1K–32K tokens) and natural long-document question answering with factual distractors show KVEraser nearly matches exact full recomputation accuracy while reducing latency growth drastically (only 24% increase vs 17.6× for full recompute). It also outperforms other approximate baselines both in quality and efficiency and generalizes well to unseen tasks.

Key findings

KVEraser matches full recomputation exact match accuracy across 1K to 32K token contexts on the Needle-In-A-Haystack benchmark (Fig. 3), unlike other approximate methods.
Latency increases by only 24% for KVEraser from 1K to 32K context size, versus a 17.6× latency increase for full suffix recomputation.
On natural long-document QA datasets (2WikiMultiHopQA, MuSiQue, IIRC), KVEraser achieves the highest exact match among approximate methods at 3–4× speedup over full recompute (Fig. 4).
Delete-and-shift and local suffix repair baselines exhibit severe errors such as outputting irrelevant values or multiple conflicting answers, demonstrating cache contamination issues.
Instruction-only forgetting increasingly outputs concatenated erased and retained values as context size grows, failing to erase influence.
KVEraser’s training with generic span-neighbor pretraining (80K samples) and targeted fine-tuning (~7.5K samples) enables generalizable erasing across domains.
Approximate baselines degrade in quality or scale poorly with longer contexts, whereas KVEraser reliably maintains near-perfect erasing performance.
Local KV replacement using learned steering states circumvents expensive suffix recomputation by operating on erased-span length rather than suffix length.

Threat model

The adversary is implicit in the use case: erroneous, stale, or harmful spans appear in the context after the initial prefill. The adversary effectively is the presence of invalid context tokens that contaminate future decoding via cached KV states. The defender (system) aims to locally erase influence of such spans without full costly recomputation. The adversary cannot modify the prefix cache before erasing, nor control queries after erasing. The adversary cannot evade detection or directly manipulate cache beyond the initial inclusion of invalid spans. Attack vectors like adversarial inputs designed to fool the eraser are not considered explicitly.

Methodology — deep read

The paper studies the problem of post-hoc localized context erasing in long-context transformer LLMs using KV caching. The threat model is a user or system detecting a short erroneous span in a cached long context after prefill, seeking to remove its influence on future decoding without fully recomputing the potentially very long suffix context.

The input context decomposes as x = p ⊕ e ⊕ s (prefix, erased span, suffix). Full recomputation requires re-running prefill on suffix s under edited prompt p ⊕ s due to causal attention dependencies, causing compute cost that scales quadratically with suffix length.

KVEraser’s core innovation is to learn a surrogate KV cache construction function that replaces only the KV states for the erased span e with a learned "steering" KV block, while reusing the prefix and suffix KV states unchanged. This local edit induces future decoding as if e had been deleted, avoiding suffix recomputation.

Architecturally, KVEraser consists of:

a frozen generator p_θ (the original pretrained LLM)
a trainable eraser module E_ϕ, which is a copy of the generator's transformer backbone excluding the LM head

E_ϕ is conditioned on the preserved prefix KV cache KV_1:m-1(x) and the erased span e, outputting replacement KV states for positions m to n. This replacement is concatenated with the original prefix and original suffix KV caches to create the surrogate cache dKV(x;m,n).

The training objective encourages the frozen generator's decoding from the surrogate cache to match decoding from the exact edited prompt (p ⊕ s) using teacher forcing and cross-entropy loss.

To train the eraser, a two-stage process is used:

Generic span-neighbor pretraining: samples are constructed by randomly inserting 100-token spans into long Wikipedia documents. The eraser learns to suppress erased span influence while preserving access to retained context by predicting neighboring tokens across the erased span.
Task-specific fine-tuning: using ~7.5K samples from synthetic Needle-in-Haystack benchmarks and factual distractor QA datasets (Natural Questions, TriviaQA, HotpotQA), to adapt to realistic erasing scenarios.

Evaluations are performed on:

Synthetic NIAH benchmark with controlled context sizes from 1K to 32K tokens.
Natural long-document QA datasets with harmful factual distractors unseen during training.

Baselines compared include:

Full recomputation
Delete-and-shift of KV cache
Instruction-only forgetting (keep cache but instruct model to ignore erased span)
Local suffix repair (recomputing 15% of suffix tokens near erased span or query).

Metrics include exact match on retrieved correct values after erasing, and latency measured on full query and decode pipeline excluding initial prefill.

KVEraser is implemented and trained with standard transformer recipes using frozen backbone initialization for eraser, teacher forcing, and tested at various context lengths and erasure span positions. Code release and frozen weights are not explicitly mentioned in the paper.

A concrete example: for a long context containing two conflicting "needles" (key-value pairs), with the earlier needle designated as erased, KVEraser replaces the erased span KV states with learned steering KV states, allowing the model to answer queries as if the erased needle never appeared, without recomputing the entire suffix. This reduces latency from tens or hundreds of seconds down to a small fraction and maintains exact match accuracy identical to full recomputation.

Statistical significance or cross-validation details are not explicitly described, but evaluation metrics are averaged over sizeable held-out splits. Failures of baselines are diagnosed qualitatively. The approach requires careful pretraining and fine-tuning to generalize across domains and long sequences. Overall, the methodology balances functional behavioral approximation of erased caches with efficient local cache editing through learned KV space manipulation.

Technical innovations

Formulating post-hoc context erasing as a localized KV cache editing problem using learned steering KV states replacing only the erased span.
Introducing a two-stage training pipeline: generic span-neighbor pretraining plus task-specific fine-tuning to learn transferable erasing capabilities.
Parameterizing the eraser as a trainable copy of the frozen generator’s transformer backbone (excluding LM head) conditioned on preserved prefix KV and erased span.
Achieving inference-time complexity scaling with erased-span length rather than suffix length by reusing suffix KV states unchanged.
Demonstrating functional behavioral matching of decoding from edited prompts by optimizing cross-entropy loss with teacher forcing on surrogate KV caches.

Datasets

Wikipedia chunks — 80K samples for generic span-neighbor pretraining — synthetic construction
Needle-in-a-Haystack (NIAH) synthetic benchmark — 1.2K total samples (200 per context size from 1K to 32K tokens) — synthetic
Natural Questions (Petroni et al., 2021), TriviaQA (Kwiatkowski et al., 2019), HotpotQA (Joshi et al., 2017) — ~7.5K factual distractor samples for fine-tuning — filtered from public QA datasets

Baselines vs proposed

Full recomputation: exact match = ~1.0 at all context sizes vs KVEraser: ~1.0 exact match
Full recomputation latency growth: 17.6× increase from 1K to 32K context vs KVEraser: 24% increase
Delete-and-shift: exact match drops rapidly with context size, latency grows substantially vs KVEraser: stable exact match and low latency growth
Instruction-only forgetting: exact match decreases with context size due to persistent contamination vs KVEraser: stable high exact match
Local suffix repair (15% suffix recomputed): accuracy degrades with context size, latency similar or worse than full recompute vs KVEraser: better accuracy and latency
On unseen QA datasets (2WikiMultiHopQA, MuSiQue, IIRC) approximate baselines have lower exact match than KVEraser, which achieves best approximate quality with 3–4× speedup vs full recompute

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2606.17034.

Fig 1

Fig 1: Illustration examples of KVEraser application scenarios.

Fig 2

Fig 2: Pipelines of KVEraser. Snowflake and fire emojis were generated by GPT 5.2.

Fig 3

Fig 3 (page 2).

Fig 4

Fig 4 (page 2).

Fig 5

Fig 5 (page 2).

Fig 6

Fig 6 (page 2).

Fig 7

Fig 7 (page 2).

Fig 8

Fig 8 (page 2).

Limitations

Focuses only on single-span contiguous erasing; multiple span deletion is left for future work.
Requires specialized training of eraser module; applicability with different base models or architectures not evaluated.
No evaluation under adversarial or worst-case erasure scenarios; robustness to adversarial injections not studied.
Assumes access to original prefix cache and erased span KV states; may not generalize if prefix cache is partially corrupted or unavailable.
Code release and reproducibility details such as frozen weights and random seeds are not explicitly provided.
Effectiveness on other cache types (e.g., non-causal attention or non-transformer backbones) is untested.

Open questions / follow-ons

How does KVEraser perform when multiple disjoint spans require erasing concurrently or iteratively?
Can eraser training generalize to handle adversarially crafted poisoned spans designed to evade erasure?
What are limits of transferability of learned KV steering to different models, architectures, or domains without fine-tuning?
How robust is KVEraser when prefix cache itself has degradation, partial corruption, or missing KV entries?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, KVEraser introduces a learned KV cache editing paradigm that could support efficient and accurate context sanitization in LLM-powered dialogue agents or retrieval-augmented systems. When a user’s prior input or retrieved evidence is identified post-hoc as malicious, obsolete, or privacy-sensitive, KVEraser enables localized erasing of that context span’s influence without incurring full suffix recomputation costs. This improves latency and throughput for real-time applications that require dynamic context correction or content filtering. Additionally, KVEraser’s approach highlights the critical importance of cache state management beyond simple instruction-based 'forgetting,' which is shown to be unreliable in effectively erasing unwanted context. Security engineers should consider learned KV-space interventions as a promising new direction to combat manipulation or stale data in deployed LLM systems without sacrificing efficiency at scale.

Cite

bibtex

@article{arxiv2606_17034,
  title={ KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing },
  author={ Mufei Li and Shikun Liu and Dongqi Fu and Haoyu Wang and Yinglong Xia and Hong Li and Hong Yan and Pan Li },
  journal={arXiv preprint arXiv:2606.17034},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.17034}
}

KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​