LLM Anonymization Against Agentic Re-Identification
Source: arXiv:2605.30848 · Published 2026-05-29 · By Ziwen Li, Jianing Wen, Tianshi Li
TL;DR
The paper addresses the novel challenge of anonymizing long-form text such as interview transcripts against agentic large language model (LLM) re-identification attacks augmented with web search. Traditional anonymization focuses on removing explicit identifiers but fails to handle contextual and quasi-identifying cues that agentic LLMs exploit by cross-referencing external web evidence. The key innovation is AURA, a mask-reconstruct anonymization framework that decouples privacy-sensitive span localization from downstream utility-preserving reconstruction. AURA adaptively expands a privacy scope tailored to each transcript to identify risky spans, then reconstructs masked spans through candidate generation and selection balancing privacy and utility. This approach is empirically evaluated using 27 real user interview transcripts vulnerable to re-identification, tested against three strong agentic web-search attacker models, and measured for both privacy success and multi-dimensional utility retention.
The results demonstrate that AURA's adaptive privacy scope variants reduce agentic re-identification rates to 0–5 out of 27 transcripts across attackers, substantially below named entity recognition (NER) baselines (13–21 re-ids) and prior iterative LLM anonymizers (6–7 re-ids). At the same time, AURA preserves 74.9–80.3% of unit-level contextual utility (profile, codebook, and utility-grid facts), outperforming differentially private text rewriting that achieves near-zero re-identifications but drastically reduced utility. Moreover, on-device open models match or exceed API-based performance, supporting practical local deployment. The study highlights a new privacy-utility frontier for text anonymization in the LLM era, where agentic attacker capabilities mandate sensitivity beyond explicit identifiers but preserving analytic richness is critical for downstream research use.
Key findings
- Adaptive-privacy AURA variants achieve 0-5/27 (0–18.5%) transcript re-identification across attacker models, notably lower than NER-based Presidio at 13-21/27 (48.1–77.8%).
- 8-attribute fixed-scope AURA variants reduce re-identifications to 2–8/27 (7.4–29.6%), improving over prior iterative LLM anonymizers (6–7/27) but less than adaptive scope.
- DP-MLM differential privacy methods achieve near-zero re-identification at low ε (0/27 at ε=10,30) but with utility-grid unit recovery as low as 0% at ε=10 and max 60.1% at ε=140.
- AURA preserves 74.9–80.3% of unit-level utility-grid information, with 95.1–96.8% codebook fact recovery, substantially higher than DP baselines and better than prior LLM anonymizers at ~72% unit-grid utility.
- Results are consistent across three attacker LLMs including GPT-5.1, GPT-5.4-mini, and Gemini-3-Flash, showing robustness to attacker variation and model mismatch.
- On-device open-weight Qwen3.5 models match or exceed API-powered anonymizers on utility (78.7–80.2% unit-grid recovery) at comparable privacy levels, enabling fully local anonymization workflows.
- Masking convergence localizes re-identification sensitive spans informed by adaptive privacy scope expansion, producing a privacy-risk map usable for manual or automated redaction/reconstruction.
Threat model
The adversary is an agentic LLM system capable of generating queries based on weak contextual cues in text, performing web searches, retrieving external public information, and cross-referencing multiple evidence sources to re-identify individuals in anonymized transcripts. The adversary can infer personal attributes and quasi-identifiers beyond explicit named entities by leveraging online traces and linkable public data. However, the adversary does not have direct influence on the anonymization process, cannot modify the text before release, and must rely on inference and external search. The threat excludes black-box or white-box model poisoning or transcription manipulation attacks.
Methodology — deep read
The paper proposes AURA, a three-phase LLM-powered anonymization framework designed to balance privacy against agentic web-search re-identification and utility preservation.
Threat Model & Assumptions: The adversary is an agentic LLM equipped with web search capabilities that can exploit subtle contextual and quasi-identifying cues in text to cross-reference external public information, leading to re-identification of individuals in interview transcripts. The adversary tries to infer personal attributes beyond explicit identifiers, combining internal textual cues with online evidence. The adversary cannot tamper with the anonymization process but can query web search and perform multi-hop inference.
Data: The evaluation uses 27 real interview transcripts from the Anthropic Interviewer dataset selected for verified vulnerability to agentic re-identification out of an initial 1250 transcripts. These transcripts are information-rich, long-form, and represent real users, providing a challenging benchmark.
Architecture & Algorithm: AURA performs anonymization in three phases:
- Phase 0 Initialization: A base privacy scope of 8 attributes (age, sex, location, etc.) is adaptively expanded with transcript-specific quasi-identifiers derived via a web-search enabled agentic LLM attacker to form a privacy scope A. An insight profile P extracting analytic utility features (thematic content, emotional expression, domain knowledge, etc.) is also created.
- Phase 1 Masking Convergence: An iterative process runs R rounds of rewriting the transcript by an LLM conditioned on current inferred attributes, stopping when re-identification attributes cease emerging. Differences from the original create a masked template with placeholders for identified privacy spans.
- Phase 2 Reconstruction & Selection: For the masked spans only, the reconstructor generates N candidate replacements aiming to preserve the utility profile while reducing privacy leakage. Each candidate rewrite is scored by an attribute inference attacker producing a privacy severity score S, a specificity count C for overly specific attributes, and a utility loss L via human-informed automatic judges. Candidates satisfying a specificity threshold Cmax are filtered and the one minimizing privacy severity and utility loss is selected.
Training Regime: The approach leverages pre-trained LLMs (GPT-4.1, GPT-5.x, Qwen3.5-27B/35B) with prompting but no additional training reported. Iterations and candidate batch sizes are hyperparameters (R rounds masking, N candidates). Random seeds or hardware specifics are not detailed.
Evaluation Protocol: Privacy is measured as agentic re-identification success across three attacker LLMs (GPT-5.1, GPT-5.4-mini, Gemini-3-Flash) on the 27 transcripts, repeated 3x to report highest re-id rate. Utility is measured at three levels—interviewee profile facts, codebook facts, and joint utility-grid units—via automated fact recovery judged by LLMs and human expert coding. Baselines include NER-based redaction (Presidio), one-shot LLM rewriting (minimal and detailed prompts), prior iterative LLM anonymizer [35], and differentially private masked language model rewriting (DP-MLM) across ε privacy budgets.
Reproducibility: Source code for AURA is released publicly. The Anthropic Interviewer data is proprietary but referenced. Prompts for all phases are documented (Appendix B). The attack and evaluation protocols are specified but exact LLM checkpoint versions or seeds are not fully detailed. Overall, the evaluation is reproducible given access to the data and models.
Technical innovations
- Adaptive privacy scope expansion using agentic web-search LLM attackers to localize transcript-specific quasi-identifiers beyond fixed named-entity categories.
- Decoupled mask-reconstruct anonymization framework that first localizes privacy-sensitive text spans via iterative masking then reconstructs them with constrained candidates balancing privacy severity and utility loss.
- Joint adversarial privacy and utility-retention scoring combining attribute inference, specificity audits, and multi-dimensional utility fact recovery to select optimal sanitized text candidates.
- Demonstration of practical local anonymization via open-weight LLMs matching API-powered baselines on privacy-utility tradeoffs.
Datasets
- Anthropic Interviewer dataset — 27 re-identifiable transcripts from original 1250 — proprietary internal dataset from Anthropic
Baselines vs proposed
- Presidio NER-based redaction: re-identification rate = 13–21/27 (48.1–77.8%) vs AURA adaptive privacy: 0–5/27 (0–18.5%)
- Iterative LLM anonymizer [35]: re-identification = 6–7/27 (22.2–25.9%) vs AURA 8-attribute fixed scope: 2–8/27 (7.4–29.6%)
- One-shot minimal LLM rewriting: re-identification = 8–17/27 (29.6–63%) vs AURA variants much lower
- DP-MLM (ε=10,30): re-identification = 0/27 but unit-level utility = 0–38% vs AURA adaptive: re-identification 0–5/27 and utility 74.9–80.3%
- AURA Qwen3.5-27B: utility-grid unit recovery 78.7% vs GPT-4.1 anonymizer baseline 72.1% at comparable privacy
Figures from the paper
Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.30848.

Fig 1: AURA overview. Adaptive privacy scope expansion first augments a base re-identification

Fig 2 (page 2).

Fig 3 (page 2).

Fig 4: Screenshot of example utility-grid units in Huang et al. [14]. Each card pairs a validated
Limitations
- Evaluation limited to 27 transcripts vulnerable to agentic re-identification; small sample size may affect generalizability.
- Agentic re-identification attacks simulated using specific LLM models and prompt protocols; unknown if stronger or novel attackers might succeed more.
- Utility evaluation based on fact recovery and proxy metrics rather than direct human interpretive studies of qualitative research value.
- Differential privacy baselines evaluated with token-level perturbation methods, which may not represent all sophisticated DP text sanitization approaches.
- Masking and reconstruction based on prompt engineering with specific LLM checkpoints; robustness to different architectures or prompt variants not extensively tested.
- No explicit adversarial evaluation of reconstruction stage beyond privacy and utility scoring; possibility of overfitting to attacker models.
Open questions / follow-ons
- How to further improve robustness against evolving agentic attacker models with different search strategies or evidence sources beyond web search?
- Can the mask-reconstruct framework be extended with formal privacy guarantees such as differential privacy while maintaining utility?
- How to better quantify and preserve nuanced qualitative research utility including interpretive and readability dimensions beyond fact recovery?
- What are scalable semi-automatic workflows combining human-in-the-loop review with adaptive masking for sensitive real-world deployments?
Why it matters for bot defense
For bot-defense and CAPTCHA practitioners, this paper highlights important insights on how large language models augmented with web search fundamentally change text anonymization threat models. Simple redaction or surface-level perturbations no longer suffice, as agentic LLMs exploit subtle context combined with external evidence for re-identification. The AURA mask-reconstruct framework demonstrates a promising direction to adaptively localize and transform sensitive spans to impede such attacks while preserving textual utility critical for downstream tasks. Given that many web abuse or data privacy challenges involve natural language data including chat logs or user transcripts, practitioners can draw from this work to design more sophisticated anonymization pipelines that explicitly test against agentic attacks. The decoupling of privacy span localization from utility-preserving reconstruction is especially compelling for layered defense. Additionally, the demonstrated feasibility of local deployment with open models reduces risk of exposing sensitive data to remote APIs. Overall, this paper underscores the need for bot-defense engineers to consider multi-stage, adaptive anonymization methods and realistic attacker models that incorporate external knowledge, rather than relying solely on classic NER or one-shot text rewriting defenses in the LLM era.
Cite
@article{arxiv2605_30848,
title={ LLM Anonymization Against Agentic Re-Identification },
author={ Ziwen Li and Jianing Wen and Tianshi Li },
journal={arXiv preprint arXiv:2605.30848},
year={ 2026 },
url={https://arxiv.org/abs/2605.30848}
}