Skip to content

SCI-Defense: Defending Manipulation Attacks from Generative Engine Optimization

Source: arXiv:2605.21948 · Published 2026-05-21 · By Xucheng Yu, Haibo Jin, Huimin Zeng, Haohan Wang

TL;DR

This paper addresses a novel class of attacks on LLM-based product ranking systems, termed Generative Engine Optimization (GEO). In GEO, adversaries strategically inject semantic content into product descriptions to manipulate LLM rankers and artificially boost product rankings. Existing defenses such as perplexity filters, content safety classifiers, and paraphrasing fail to detect these sophisticated semantic attacks as they rely on surface-level features or harmful intent that GEO attacks lack. To counteract this, the authors propose SCI-Defense, a three-part framework consisting of Perplexity-based detection to catch anomalous token patterns, Semantic Integrity Scoring (SIS) which evaluates specific semantic manipulation dimensions, and Inter-Candidate Detection (ICD) leveraging cross-candidate embedding similarity. The SIS component operationalizes semantic persuasion signals through four dimensions—Authority Attribution, Narrative Purposiveness, Comparative Claims, and Temporal Claims—to identify manipulative text designed to sway LLM rankers.

Evaluated on 600 Amazon product descriptions and 600 MS MARCO web passages, SCI-Defense achieves perfect precision and zero false positive rate, while maintaining high recall (1.0 for String attacks, ~0.95 for Reasoning, and ~0.83 for Review style attacks). Importantly, the system outperforms state-of-the-art baselines that fail completely on semantic manipulations. The authors also discover new black-box attacks exploiting manipulation of factual relevance signals rather than persuasion signals, which evade SCI-Defense detection, exposing a structural blind spot and an important area for future research. Overall, SCI-Defense demonstrates that defending semantic ranking manipulation requires detection techniques sensitive to textual intent and structure, not merely statistical or toxicity cues.

Key findings

  • SCI-Defense achieves Precision=1.000 and False Positive Rate=0.000 across 1,200 evaluations combining Amazon product and MS MARCO web passage datasets.
  • Recall by attack type on Amazon product descriptions: 1.000 for String attacks, 0.952 for Reasoning attacks, and 0.830 for Review attacks.
  • Baseline defenses—perplexity filters, SafetyClf content classifiers, and paraphrasing—achieve zero recall against semantic GEO attacks.
  • Perplexity threshold τ_ppl=500 effectively detects String attacks with perfect recall due to their anomalously high perplexity (>500), while fluent semantic attacks remain below 50 and evade perplexity filters.
  • Semantic Integrity Scoring evaluates 4 manipulation dimensions with empirically tuned weights: Authority Attribution (0.30), Narrative Purposiveness (0.25), Comparative Claims (0.25), and Temporal Claims (0.20).
  • Inter-Candidate Detection alone yields high FPR=0.74, but when combined with SIS and Perplexity in SCI-Defense, overall FPR=0.
  • New black-box GEO attacks (Specification Amplification, Use-Case Saturation, etc.) exhibit SIS scores 0.35–0.42 below detection threshold (τ_s=0.55), producing Block@3=0.000 while evading SCI-Defense.
  • Ablation studies confirm PPL alone detects String attacks; SIS detects Reasoning and Review attacks; ICD complements with cross-candidate anomalies but is insufficient alone.

Threat model

The adversary is a black-box operator who controls injected text appended to their own product descriptions with the goal of boosting rankings in an LLM-based ranker. They have no internal knowledge of the LLM or defense mechanisms and no ability to modify competitor text or access the user query. The attacker must produce text convincing to both the LLM and human buyers, ruling out incoherent or obviously adversarial injections. The attacker cannot perform white-box gradient attacks or token-level poisoning and is constrained to natural, semantically fluent manipulations.

Methodology — deep read

The paper focuses on defending against Generative Engine Optimization (GEO) attacks targeting LLM-based product rankers, which output a ranking π based on product descriptions.

The threat model assumes a black-box adversary controlling injected text appended to owned product descriptions to manipulate rankings, without access to ranker internals or competitors' data. The adversary must produce semantically natural text convincing to both the LLM ranker and human buyer, ruling out incoherent jailbreak-style attacks.

Data: The primary benchmark ProductBench comprises 600 Amazon product descriptions evenly spread across 6 categories (Automotive, Electronics, Home & Kitchen, Toys & Games, Computers & Accessories, Industrial & Scientific). Each description is independently attacked by three GEO attack types (String, Reasoning, Review) for a total of 1,800 attack instances. A secondary evaluation uses 600 MS MARCO web passages across 6 domains. Legitimate and attacked labels are used per instance. 20% of Automotive data serves as hold-out validation.

Architecture: SCI-Defense is a pipeline with three components operating sequentially:

  1. Perplexity Detection (PPL): Computes GPT-2 token-level perplexity on the entire description plus injected text. If perplexity exceeds τ_ppl=500, the candidate is immediately flagged as manipulated to catch statistical anomalies typical of String attacks.
  2. Semantic Integrity Scoring (SIS): For more fluent semantic attacks evading PPL, text is scored by GPT-4o for 4 dimensions—Authority Attribution (presence of certifications, expert endorsements), Narrative Purposiveness (text structured to persuade), Comparative Claims (explicit superiority statements), and Temporal Claims (urgency signals). These four scores are linearly combined with weights (λAA=0.30, λNP=0.25, λCA=0.25, λTC=0.20) and an amplification boost is applied if any individual dimension surpasses thresholds, to flag strong semantic manipulations.
  3. Inter-Candidate Detection (ICD): Calculates embedding similarity across all competing products to detect anomalous similarity patterns due to competitor references. ICD alone is noisy (high FPR) but combined as S_final = α * SIS + (1-α) * ICD with α optimized on validation.

Detection and Penalization: Final scores are thresholded at τ_m (manipulated) and τ_s (suspicious) with any flagged product moved to the last rank position to penalize manipulation conservatively. Thresholds prioritize FPR=0 due to commercial risk of false positives.

Training: SIS weights, thresholds, and boost rules are tuned by grid search on the Automotive validation split to maximize F1 under no false positives.

Evaluation metrics include Recall, Precision, F1-score, False Positive Rate, and attack-block metrics like Block@3 (fraction of attacks where the manipulated product is not in top-3 ranks after defense).

The paper evaluates all components separately and combined, comparing against strong baselines (PPL-only, SafetyClf classifier, and paraphrasing defenses).

One concrete example: A product description undergoes PPL scoring; if below 500, its text is fed to GPT-4o with specialized prompts to score the four semantic dimensions. The weighted sum plus boost yields SIS score, which is combined with ICD scores from cross-product similarity. The final score dictates flagging or passing the description, influencing the product’s rank position.

No code or weights are currently released; the ProductBench dataset is publicly cited but not necessarily released. Evaluation is comprehensive across diverse categories and also tested for generalization on non-commercial web passage text.

The authors detail extensive threshold tuning and ablations to ensure zero false positives with high semantic attack recall, as false suppression of legitimate products would cause serious commercial harm.

Technical innovations

  • Introduction of a semantic integrity scoring system (SIS) that measures manipulation intent via four distinct semantic persuasion dimensions (Authority Attribution, Narrative Purposiveness, Comparative Claims, Temporal Claims) scored by GPT-4o.
  • Combination of complementary orthogonal signals in a three-component defense: high-perplexity filtering for statistical attacks (PPL), semantic persuasion signal detection (SIS), and cross-candidate embedding similarity anomaly detection (ICD).
  • A novel composite boosting mechanism that amplifies suspicion when any single SIS dimension surpasses thresholds, improving detection sensitivity without raising false positives.
  • Identification and formalization of new black-box GEO attacks (Specification Amplification, Use-Case Saturation) that exploit a structural blind spot in persuasion-based semantic detection by inflating factual relevance signals instead.

Datasets

  • ProductBench — 600 product descriptions across 6 Amazon categories (Automotive, Electronics, Home & Kitchen, Toys & Games, Computers & Accessories, Industrial & Scientific) — from CORE [3]
  • MS MARCO — 600 web passages across 6 informational domains (Technology, Science, Health & Medicine, Law & Government, Finance & Economics, History & Culture) — publicly available

Baselines vs proposed

  • PPL Filter: Recall on semantic GEO attacks = 0.0 vs SCI-Defense: Recall String=1.0, Reasoning=0.952, Review=0.830
  • SafetyClf content classifier: Recall = 0.0 across all GEO attack types vs SCI-Defense Recall up to 1.0
  • Paraphrasing defense: Recall = 0.0 and introduces FPR=0.027 on legitimate descriptions vs SCI-Defense FPR=0.000
  • SCI-Defense achieves Block@3 = 1.000 for all attacks; all baselines Block@3 = 0.000 on semantic attacks

Limitations

  • SCI-Defense relies on access to product description text and does not use query context or competitor descriptions, limiting detection of some semantic relevance inflation attacks.
  • The discovered blind spot allows novel attacks inflating semantic relevance signals (Specification Amplification, Use-Case Saturation) that score below detection thresholds and evade SCI-Defense, resulting in zero block rate.
  • Evaluation uses only 600 product descriptions and 600 web passages; larger scale or real-world deployment could reveal different operational challenges or rare false positives.
  • The defense components, especially SIS, depend on GPT-4o semantic scoring, which may be costly and reliant on black-box LLM APIs limiting reproducibility and accessibility.
  • No adversarial/ adaptive attacker evaluation on the final deployed system incorporating query-aware or competitor-aware attacks that might further evade detection.
  • FPR=0 is prioritized via threshold tuning on domain-specific validation sets, suggesting thresholds require careful per-domain calibration for operational deployment.

Open questions / follow-ons

  • How to extend SCI-Defense to detect attacks exploiting semantic relevance inflation signals, potentially by incorporating query context or competitor data?
  • Can query-aware or cross-session analysis help identify subtle GEO manipulation that evades semantic persuasion signal detection?
  • What are the robustness and computational trade-offs of SIS reliance on large LLMs like GPT-4o, and can smaller or open models replicate these semantic integrity scores reliably?
  • How do GEO attacks evolve if attackers optimize specifically against SCI-Defense signals and can adaptive adversarial training improve defense resilience?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners focused on LLM-powered ranking or search systems, this work highlights a fundamentally new manipulation threat vector: semantic manipulation of content designed to deceive LLM rankers while maintaining human plausibility. It underscores the inadequacy of existing defenses like perplexity filters or toxicity classifiers for these attacks. Implementers should consider defenses sensitive to deeper semantic and rhetorical structure, not just surface anomalies or harmful content. The SCI-Defense framework provides actionable insights on combining statistical, semantic, and contextual signals to detect manipulations that optimized adversaries may deploy. However, the identified blind spots also counsel caution; defenses lacking query integration or competitor context risk evasion through subtle factual inflation. Captchas or bot-detection systems for e-commerce or content ranking must therefore integrate semantic intent analysis, possibly with cross-candidate or session-level monitoring, to robustly thwart manipulation attempts.

Cite

bibtex
@article{arxiv2605_21948,
  title={ SCI-Defense: Defending Manipulation Attacks from Generative Engine Optimization },
  author={ Xucheng Yu and Haibo Jin and Huimin Zeng and Haohan Wang },
  journal={arXiv preprint arXiv:2605.21948},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.21948}
}

Read the full paper

Articles are CC BY 4.0 — feel free to quote with attribution