Disentangling Answer Engine Optimization from Platform Growth: A Log-Based Natural Experiment on ChatGPT Referral Traffic

Source: arXiv:2606.04362 · Published 2026-06-03 · By Keisuke Watanabe, Kazuki Nakayashiki

TL;DR

This paper investigates the effectiveness of Answer Engine Optimization (AEO)—a practice analogous to SEO that targets referrals from large language model (LLM) answer engines such as ChatGPT—on a large high-traffic web domain (glasp.co). The key challenge addressed is separating the growth in referral traffic due to AEO interventions from the rapid underlying platform growth (tailwind) of ChatGPT referrals overall. The authors leverage a natural experiment on the glasp.co domain, where AEO was applied only to one subset of pages (/youtube/ Q&A pages), while the rest of the domain remained untreated and acted as a contemporaneous control exposed to the same external growth factors. Using first-party server logs and Google Analytics data rather than third-party estimators, they show that although raw ChatGPT referral traffic to treated pages grew 6.1x over 5 months, untreated pages on the same domain also grew 3.5x. Applying a difference-in-differences interrupted time-series regression on the treated/control traffic ratio, they estimate an intervention-attributable level increase of approximately 1.82x (95% CI 1.31–2.54) in ChatGPT referrals aligned with the AEO rollout, robust to engagement filtering and specification changes. However, a placebo-in-time permutation test yields p=0.16, thus the evidence is suggestive but not conclusive due to a noisy and short pre-period. They also find no measurable SEO harm as organic Google clicks held steady. The key methodological contribution is demonstrating that on-domain untreated controls are critical to isolate treatment effects from platform tailwind, implying many public AEO growth claims substantially overstate causal impact.

Key findings

Raw monthly ChatGPT referrals to treated pages increased 6.1× from Jan to May 2026, while untreated control pages on the same domain increased 3.5× in the same period, reflecting a large platform tailwind.
The interrupted time-series model on weekly logged treated/control ratios estimates a discrete level increase of 1.82× in ChatGPT referrals immediately post-intervention (95% CI 1.31–2.54, HAC p=0.001).
Engagement-filtered sessions show a similar effect size of 2.27× with the same level break statistical significance.
A placebo-in-time permutation test yields a p-value of 0.16, indicating the effect is suggestive but not conclusive given short, noisy pre-intervention data.
Google Search Console data shows treated pages’ organic clicks fell about 25%, in line with an ambient 20% site-wide decline, implying no treatment-specific SEO penalty or deindexation.
Pre-intervention trends were present and significant (approx +11% per month growth in treated/control ratio), but the intervention caused a level shift rather than slope change.
The combined intervention was a bundle including URL canonicalization, demand mining via 404 bot logs, title/summary rewriting into Q&A form, and an SEO-protection lockout for pages with existing organic clicks.
The study used first-party analytics and server logs over a corpus of hundreds of thousands of YouTube Q&A pages, enabling scale and robustness to third-party estimation noise.

Threat model

The paper does not specify an adversarial threat model; rather, the analysis considers the role of external platform-level growth (the 'tailwind') as a confounding factor that must be accounted for in causal inference of AEO treatment effects. The assumption is that any external platform growth similarly affects treated and control subsets multiplicatively and contemporaneously, enabling the ratio approach. The adversary here can be interpreted as confounding growth factors but not direct attacks or manipulations on analytics data.

Methodology — deep read

Threat Model & Assumptions: The study assumes that ChatGPT referral traffic can be decomposed into a platform-level tailwind growth common to all pages on the domain and an additional growth attributable to the AEO interventions. The key challenge is that the platform itself grows explosively over time, confounding before-after comparisons. The adversary is not explicitly modeled; rather the focus is on causal inference of treatment effect from observational data with internal controls.
Data: The data comes from a large high-traffic domain glasp.co containing hundreds of thousands of YouTube question-and-answer pages under the "/youtube/" path (treated group). The remaining site pages (long-form articles, profiles, discovery pages) form the untreated contemporaneous control. Data sources include first-party Google Analytics 4 (GA4) API session logs for ChatGPT referrals (sessionSource contains 'chatgpt') and Google Search Console for SEO metrics (clicks and impressions). The study covers 47 weeks (26 pre-intervention, 21 post-intervention) from July 2025 through May 2026. Data were cleaned to exclude non-human referrals via bot filtering and engagement thresholds.
Intervention: The AEO treatment was a bundled deployment in January 2026 on the /youtube/ corpus only, consisting of (a) URL canonicalization to consolidate duplicates, (b) demand mining from ChatGPT bot 404 request logs to create new pages targeting high-demand queries, (c) rewriting titles into question form and lead summaries into standalone answer snippets prioritized by bot request volume, and (d) an SEO-protection rule locking pages with meaningful Google clicks from rewriting, while unpublished low-interest pages, ensuring no organic SEO harm.
Architecture / Algorithm: No machine learning model was introduced; the work focuses on causal measurement. Their key estimator is the interrupted time-series (ITS) segmented regression on the log ratio of treated to control weekly ChatGPT referrals. The model includes terms for baseline trend, a discrete level shift at intervention, and slope change after intervention. Newey-West HAC standard errors correct serial correlation and heteroskedasticity. A placebo-in-time permutation test re-estimates the level break in multiple pre-intervention periods to assess significance conservatively.
Training Regime: n/a for ML model; analysis was conducted on aggregate weekly logs. The pre-intervention period was the 26 weeks in second half of 2025, excluding first half 2025 due to low referral volume noise. Post-intervention covers 21 weeks through May 2026. Multiple robustness checks include filtering on engaged (non-bounce) sessions, excluding outlier spike weeks, and changing pre-period start to September 2025.
Evaluation Protocol: Primary metric is the change in the treated/control ratio of weekly ChatGPT referral sessions on a logarithmic scale using ITS regression and difference-in-differences interpretation. Confidence intervals come from HAC-corrected standard errors and block bootstrap. The placebo-in-time test provides a permutation p-value. SEO effects are measured by organic clicks and impressions to treated pages compared to site-wide trends, confirming no significant distortion.
Reproducibility: The authors release their analyzed indexed time series data, analysis code (including ITS regression, Newey–West covariance estimator, placebo test), and plotting scripts as supplementary material. Raw traffic counts are withheld due to commercial confidentiality, but scale and relative metrics are stable and publicly verifiable given the code and indexing.

Concrete Example: Weekly data for July 2025 through May 2026 show that both treated and control page referrals grew, reflecting platform growth. By taking the log ratio of treated to control metrics over time, the ITS model identifies an underlying upward pre-intervention trend and estimates a substantial and discrete 1.82-fold level increase aligned with the January 2026 AEO rollout, robust across several robustness filters but with a placebo p=0.16 indicating caution in interpretation. Organic search metrics remain stable, ruling out SEO harm.

Technical innovations

Use of an on-domain untreated subset as a contemporaneous control to separate treatment effects from platform-wide tailwinds in measuring LLM answer engine referral growth.
Application of a segmented regression interrupted time-series (ITS) model on the logged treated/control traffic ratio with Newey–West HAC standard errors to model discrete level breaks and linear trends.
Introduction of a conservative placebo-in-time permutation test to quantify if observed intervention-aligned traffic jumps exceed plausible pre-intervention noise.
Bundled AEO intervention including URL canonicalization, demand mining from AI bot 404 logs, Q&A-style content rewriting, and SEO-protection guardrails to balance organic search preservation with answer engine optimization.

Datasets

glasp.co /youtube/ corpus — hundreds of thousands of question-and-answer pages — proprietary high-traffic web domain
glasp.co untreated pages — thousands to tens of thousands of sessions/week — same domain, used as contemporaneous control

Baselines vs proposed

Untreated control pages: ChatGPT referrals growth = 3.5× (Jan-May 2026)
Treated pages (AEO applied): ChatGPT referrals growth = 6.1× (Jan-May 2026)
Difference-in-differences ratio of growth rates = 1.75× (sessions), 2.48× (engaged sessions)
ITS segmented regression level break estimate: exp(β2) = 1.82× (95% CI 1.31–2.54) vs no level break (1.0×)
Placebo-in-time permutation test for level break yielded p=0.16 vs standard significance threshold p=0.05
Organic Google clicks: treated /youtube/ pages fell ≈25%, in line with site-wide trend ≈20%, vs baseline no treatment effect decline

Limitations

Non-randomized observational design; the intervention date t0 is approximated and rollout was gradual, limiting causal certainty.
Short pre-intervention period with autocorrelation and noise yields a placebo p-value of 0.16, indicating the intervention effect is suggestive but not conclusive.
The intervention is a bundled combination of four tactics, so individual component effects are not separately identified.
Only a single domain (glasp.co) and a single dominant answer engine platform (ChatGPT) are studied, limiting external and platform generalizability.
Content heterogeneity between treated and control groups exists; groups differ by page type and intent, not matched or randomized.
The mid-March 2026 bot-filtering policy change altered engagement composition, which requires robustness checks but adds confounding complexity.
Absolute traffic numbers and confidential details withheld, limiting exact replication; only indexed relative metrics and code are released.

Open questions / follow-ons

What is the isolated causal effect of individual AEO components such as URL canonicalization or demand mining when randomized and separated?
How generalizable are these findings across other domains, content types, and alternative LLM answer engines like Perplexity or Gemini?
Can randomized controlled trials or matched experimental designs confirm or refine the suggestive effect observed here?
How does long-term dynamic interplay between answer engine optimization and organic SEO evolve beyond the short 5-month window, especially with changing bot filtering policies?

Why it matters for bot defense

Bot-defense and CAPTCHA practitioners aiming to understand referral traffic from LLM-based answer engines should note that referral growth from such AI-driven sources can experience large platform-wide tailwinds independent of targeted optimization efforts. This study advises using robust natural experiments with appropriate contemporaneous controls—ideally on-domain untreated subsets—to separate true optimization effects from confounding platform growth. For CAPTCHA contexts, this suggests that observed increases in suspected bot traffic or referral spikes linked to answer engines should be cautiously interpreted against underlying platform-wide trends. Furthermore, the paper highlights that optimizing content for LLM referrals (Answer Engine Optimization) need not come at the expense of organic search performance, reducing concerns about conflicts between SEO and bot defense strategies. Lastly, the proposed measurement methodology using first-party analytics and interrupted time series with placebo testing provides a rigorous statistical framework to evaluate interventions affecting LLM referral traffic in live production settings, which can inform data-driven bot defense tuning and understanding referral traffic provenance.

Cite

bibtex

@article{arxiv2606_04362,
  title={ Disentangling Answer Engine Optimization from Platform Growth: A Log-Based Natural Experiment on ChatGPT Referral Traffic },
  author={ Keisuke Watanabe and Kazuki Nakayashiki },
  journal={arXiv preprint arXiv:2606.04362},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.04362}
}

Disentangling Answer Engine Optimization from Platform Growth: A Log-Based Natural Experiment on ChatGPT Referral Traffic ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​