EcoGEO: Trajectory-Aware Evidence Ecosystems for Web-Enabled LLM Search Agents

Source: arXiv:2605.12887 · Published 2026-05-13 · By Hengwei Ye, Jiasheng Mao, Zhenhan Guan, Zheng Tian

TL;DR

This paper addresses the problem that current Generative Engine Optimization (GEO) methods focus on optimizing individual webpages to increase visibility or influence in generative search, while ignoring how web-enabled large language model (LLM) agents actively construct evidence across multiple search steps and webpages. The authors shift the unit of analysis to the evidence ecosystem level, introducing EcoGEO, which treats optimization as an environment-level influence problem that shapes the agent's browsing trajectory and evidence synthesis. They propose TRACE, a Trajectory-Aware Coordinated Evidence Ecosystem that builds a controlled multi-page evidence environment around a fictional target product, coordinating a navigation entry page with heterogeneous but consistent support pages connected via internal links. Evaluated on OPR-Bench, a controlled open-ended product recommendation benchmark with over 3,000 query-product pairs, TRACE substantially outperforms page-level GEO baselines in final recommendation accuracy. Trajectory-level analysis shows that TRACE increases early target crawls, target-specific follow-up searches, and internal-link navigation, demonstrating improvement derives from shaping agent browsing rather than just adding more content. An ablation confirms coordinated evidence and navigation page design both contribute to gains. Overall, this work highlights the importance of understanding and optimizing search agent trajectories through coordinated evidence ecosystems rather than isolated pages.

Key findings

TRACE increased final target recommendation rates by +31.3% (SafeSearch), +15.7% (E-Commerce), and +14.9% (E-GEO) absolute over the strongest page-level GEO baselines.
Initial target-result crawls improved from 9.4%-43.8% in baselines to 42.2%-61.2% under TRACE across datasets, indicating faster agent entry into target evidence.
Target-specific second search queries rose from 7.8%-20.3% in baselines to 21.9%-29.7% with TRACE, showing improved agent focus on the target after initial exposure.
Only TRACE induced non-trivial internal-link crawling (9.4%-20.4%), evidencing deeper intra-ecosystem navigation rather than isolated page access.
Follow-up target-result crawl rates were not always higher under TRACE than baselines, suggesting repeated exposure to a single page is less influential than coordinated browsing.
Page-level GEO methods (C-SEO, E-GEO, AutoGEO) often failed to significantly outperform the unoptimized single-page baseline, showing limits of isolated content rewriting.
In controlled crawl exposure ablations, coordinated multi-page evidence alone improved target recommendation from 75.0% to 82.8%, and adding a navigation-style entry page further raised it to 89.1%-93.4%.
The navigation entry page substantially increased internal-link crawling from near 0% to above 25%, highlighting its role in encouraging evidence network traversal.

Threat model

The adversary is a content creator or system operator seeking to increase the influence of a target product in web-enabled LLM search agent recommendations. They can construct and control coordinated multi-page evidence ecosystems with consistent attributes, internal links, and controlled snippets. They cannot alter open-web ranking beyond injecting a synthetic target result at a fixed controlled exposure position, nor do they manipulate agent model internals or query generation. The adversary cannot physically publish false content on the open web and is limited to synthetic controlled environments for safe evaluation.

Methodology — deep read

The authors study web-enabled LLM search agents operating in an interactive loop: at each step, agents can SEARCH (issue queries to Google Search API), CRAWL (fetch webpage content of a selected link from aggregated search results), or ANSWER (generate the final recommendation). Each query-product benchmark instance pairs an open-ended product recommendation query with a fictional but plausible target product not found on the open web, enabling controlled exposure experiments without external confounds.

The dataset, OPR-Bench, combines 3,124 query-product pairs from SafeSearch, E-Commerce, and E-GEO query sources, filtering to clear recommendation intents and pairing each query with a constructed target description.

They contrast three evidence-environment conditions: 1) Single-Page baseline (one official product page from Pdesc), 2) Page-Level GEO baselines including C-SEO, E-GEO, and AutoGEO which optimize single-page text content for citation and snippet quality, and 3) TRACE, which builds a coordinated multi-page evidence graph GP with a navigation entry page and heterogeneous support pages (official, review, expert, news, forum, social). These pages share consistent product attributes and cross-page internal links to sustain structured evidence.

In TRACE, the navigation entry page acts as a gateway that summarizes product attributes, mimics evaluation-focused language, and exposes structured links to role-specialized support pages. Support pages provide complementary heterogeneous evidence. Internal links connect these pages to encourage multi-page crawl trajectories. All pages derive their wording/layout/evidence framing from the same Pdesc.

Experiments use GPT-5.1 as the agent backbone with a "9+1" controlled exposure protocol: for the initial search round, agents see nine open-web distractors plus one synthetic target-related result placed fifth. Follow-up searches either return open-web distractors plus one synthetic target page or ecosystem-local retrieval restricted to the constructed support page pool. The agent controls crawl decisions up to five queries and five crawls total before answering.

Evaluation metrics combine final-answer performance (explicit target recommendation rate) and trajectory-level indicators: initial target-result crawl rate, target-specific second search query rate, follow-up target-result crawl rate, and internal-link crawl rate.

They run main experiments on the full 3,124-instance benchmark, with ablations on smaller subsets using forced initial crawl exposure to isolate downstream effects.

This design allows measuring how evidence environment structure and page coordination influences not just what content is available but how the agent acquires evidence and synthesizes recommendations over multiple interactive steps.

Methodological constraints include using fictional products for safe and reproducible testing, fixed budgets for query/crawl steps, and controlled injection of target pages to analyze agent browsing behavior systematically.

Technical innovations

EcoGEO: A conceptual shift framing generative engine optimization (GEO) as an ecosystem-level problem focusing on the browsing trajectory and evidence environment rather than isolated page-level influence.
TRACE framework: Construction of a trajectory-aware coordinated evidence ecosystem combining a navigation entry page and multiple heterogeneous yet attribute-consistent support pages interconnected via internal links.
Controlled benchmark OPR-Bench: A first open-ended product recommendation benchmark pairing recommendation queries with fictional target products for reproducible agent evaluation.
A controlled exposure protocol injecting synthetic target results at a fixed rank into search results combined with controlled ecosystem-local retrieval for follow-up target-specific searches.

Datasets

OPR-Bench — 3,124 query-product pairs — Combined from SafeSearch (64), E-Commerce (121), and E-GEO (2,939) public query sources with fictional target products constructed for evaluation.

Baselines vs proposed

Single-Page Baseline: Target Recommendation = 28.1%-56.2% across datasets vs TRACE 67.2%-73.9%
C-SEO Page-Level GEO: Target Recommendation = 35.9%-56.0% vs TRACE 67.2%-73.9%
E-GEO Page-Level GEO: Target Recommendation = 35.9%-58.8% vs TRACE 67.2%-73.9%
AutoGEO Page-Level GEO: Target Recommendation = 28.1%-53.7% vs TRACE 67.2%-73.9%
Ablation on SafeSearch: Uncoordinated multi-page = 75.0%, Coordinated = 82.8%, TRACE = 89.1%
Ablation on E-Commerce: Uncoordinated multi-page = 81.8%, Coordinated = 87.6%, TRACE = 93.4%

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.12887.

Fig 2: Navigation page example for ClearTone Pulse.

Fig 3: Official page example for ClearTone Pulse.

Fig 4: News page example for ClearTone Pulse.

Fig 5: Social page example for ClearTone Pulse.

Fig 6: Forum page example for ClearTone Pulse.

Fig 7: Expert page example for ClearTone Pulse.

Fig 8: Review page example for ClearTone Pulse.

Limitations

Experiments rely on fictional products and synthetic target pages rather than real-world web content, potentially limiting ecological validity.
Controlled exposure and fixed insertion of target results simplify real search environments where rankings and indexing impact visibility.
The study focuses on product recommendations; generalization to other query types or domains is not demonstrated.
Use of a single LLM backbone (GPT-5.1) leaves open whether findings hold for other models or agent architectures.
No adversarial agent behavior or attempts to evade influence by the evidence ecosystem were evaluated.
Limited discussion of long-term agent adaptation or impact of evolving search index changes on ecosystem effectiveness.

Open questions / follow-ons

How do real-world variations in search ranking and indexing affect EcoGEO effectiveness when synthetic exposure injection is removed?
Can EcoGEO and TRACE concepts scale to other domains beyond product recommendation, such as medical or legal queries?
How robust is coordinated evidence optimization to adaptive or adversarial agents that alter browsing strategies over time?
What automated methods could generate coordinated multi-page ecosystems at web scale without fictional targets or manual construction?

Why it matters for bot defense

This work is relevant to bot-defense and CAPTCHA practitioners in that it advances understanding of how web-enabled LLM search agents acquire and synthesize evidence through multi-step browsing trajectories influenced by coordinated evidence environments rather than isolated pages. From a bot-defense perspective, recognizing that LLM agents use internal navigation links and multi-page structures to form outputs suggests new avenues for manipulating or detecting agent browsing patterns. For CAPTCHA design, ecosystem-level optimization illustrates how an adversary might engineer entire evidence spaces to bias automated recommendation outputs, indicating that defending against such multi-step manipulations requires broader context and trajectory analysis beyond single-page content. Also, trajectory-level metrics introduced in this paper can help measure bot behavior and influence channels, informing more nuanced defenses and CAPTCHAs tailored to interrupt agent evidence accumulation paths.

Cite

bibtex

@article{arxiv2605_12887,
  title={ EcoGEO: Trajectory-Aware Evidence Ecosystems for Web-Enabled LLM Search Agents },
  author={ Hengwei Ye and Jiasheng Mao and Zhenhan Guan and Zheng Tian },
  journal={arXiv preprint arXiv:2605.12887},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.12887}
}

EcoGEO: Trajectory-Aware Evidence Ecosystems for Web-Enabled LLM Search Agents ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​