Mango — Multi-Agent Web Navigation via Global-View Optimization

Source: arXiv:2604.18779 · Published 2026-04-20 · By Weixi Tong, Yifeng Di, Tianyi Zhang

TL;DR

MANGO addresses inefficiencies in LLM-based web navigation agents that start exploration from the root URL without leveraging global site structure. Many real websites have large, deep hierarchical structures, causing agents starting at the homepage to waste budget navigating irrelevant pages, fall into traps, or fail to reach targets. MANGO introduces a multi-agent navigation framework that first constructs a lightweight global site structure via partial crawling and search-based augmentation, then selects query-relevant candidate URLs as entry points. URL selection is modeled as a multi-armed bandit problem and optimized with Thompson Sampling to adaptively allocate the navigation budget across promising URLs. An episodic memory module stores past navigation trajectories and reflections to avoid redundant attempts. Evaluated on public benchmarks WebVoyager and WebWalkerQA using multiple LLM backbones including GPT-5-mini and Qwen3 variants, MANGO consistently outperforms state-of-the-art baselines by significant margins—for instance, 63.6% vs 56.3% success rate on WebVoyager (+7.3%) and 52.5% vs 25.7% success rate on WebWalkerQA (+26.8%) with GPT-5-mini. Ablations confirm the importance of global structure analysis and Thompson Sampling over random or Google-only URL selection and MCTS alternatives. MANGO’s design enables efficient, scalable web navigation in complex environments through global view optimization and adaptive budget management.

Key findings

MANGO achieves 63.6% success rate on WebVoyager with GPT-5-mini backbone, outperforming AgentOccam baseline by 7.3% absolute and WebWalker by 47.3% absolute.
On WebWalkerQA, MANGO obtains 52.5% success rate with GPT-5-mini, surpassing WebWalker (25.7%) and AgentOccam (20.3%) by over 26% absolute.
Performance scales with backbone size in Qwen3 models, improving from 17.1% SR at 4B to 28.4% at 32B on WebWalkerQA.
Compared to Thompson Sampling URL selection, variants using random or Google-only URL candidate sets reduce success rates by up to 7%, and MCTS-guided navigation underperforms by up to 17.1% absolute SR.
MANGO maintains competitive or lower average action count than baselines when using Qwen3 models, indicating efficiency, though GPT-5-mini uses more actions due to tackling harder tasks.
Ablation and sensitivity studies find that a navigation budget b=10, top-10 candidate URLs, crawl limit τ=1000, and Thompson Sampling with 10 iterations balance exploration and exploitation best.
Failure cases show 52.4% failures from exceeding navigation budgets, 24.6% from landing on wrong pages, 15.4% reasoning errors by the LLM extractor, 5.6% out-of-date gold answers, and 2.0% reflection errors.
Larger structural coverage and more accurate relevance estimation during global analysis remain bottlenecks for extremely large or deeply nested websites.

Threat model

The adversary is not explicit as this work focuses on benign web navigation improvement rather than adversarial attacks or defenses. The LLM agent operates in a budget- and environment-constrained setting without malicious interference. The method assumes the adversary does not manipulate site content or structure to deceive navigation.

Methodology — deep read

The threat model assumes an adversary-free environment focused on improving web navigation agent efficiency and accuracy rather than adversarial robustness. The agent operates in a budget-constrained setting, limited to a maximum number of navigation actions per task.

Data provenance comes from two public web navigation benchmarks. WebVoyager consists of 129 filtered QA tasks with golden answers from popular real websites. WebWalkerQA contains 680 navigation tasks across domains like conferences, education, organizations, and gaming, with both single- and multi-source QA requiring deep or multi-page exploration. No human reannotation was performed beyond original benchmarks.

MANGO first performs a lightweight breadth-first web crawl with a configurable page limit (τ=1000) from the root URL to collect in-domain URLs while filtering out non-HTML and external links. Crawled pages are scored by BM25 relevance to the user query. Concurrently, for large sites where crawling is incomplete, query-generated keywords are used with Google site-specific search to retrieve additional relevant URLs. The top 10 URLs from each source form the candidate set U.

URL selection from U is formulated as a finite-lifetime multi-armed bandit problem. Each URL arm maintains a Beta distribution initialized based on normalized BM25 scores with parameters α and β initialized as α(0) = 1 + κ·ρ and β(0) = 1 + κ·(1−ρ), where κ tunes prior strength. At each navigation step, Thompson Sampling samples θ from Beta(α, β) for each active URL to select the most promising one.

The web navigation agent, powered by an LLM backbone such as GPT-5-mini or Qwen3 variants, receives the user query and selected URL. It interacts with a standardized browser environment (Playwright for WebVoyager, Crawl4AI for WebWalkerQA) with a budget b limiting max actions (b=10). Prior navigation trajectories and reflection summaries from episodic memory are appended as prompt context if visiting URL again.

After navigation, a reflection agent evaluates task completeness and relevance using natural language prompts. Depending on assessment, a reward r ∈ {0, 1} updates the Beta distribution parameters for Thompson Sampling according to α ← α + r and β ← β + (1−r). URLs deemed dead ends are marked exhausted and excluded from further selection.

The episodic memory component stores full navigation trajectories, reflections, and final outputs, enabling retrieval to guide future attempts and avoid repeating mistakes.

Experiments run on both benchmarks employ five LLM backbones with zero-shot default API parameters. Navigation budget b and Thompson Sampling iterations are capped at 10 per task. Single-run evaluation uses metrics of success rate (SR) and average action count (AC). Baseline comparisons against AgentOccam and WebWalker replicate original environments and evaluation.

An ablation study compares MANGO to variants with random URL candidate selection (MANGOrandom), Google-only candidate URLs (MANGOgoogle), and MCTS-guided navigation. Sensitivity analyses vary hyperparameters like b, κ, τ to determine optimal settings.

Overall, the method leverages a global structural crawl and search to prune navigation start points, adaptive URL selection with Thompson Sampling to efficiently allocate budget, episodic memory to prevent revisits, and reflection assessment to guide updates. This pipeline significantly improves success rates and is reproducible via available code and data.

Technical innovations

Formulating query-relevant URL selection as a finite-lifetime multi-armed bandit problem with Thompson Sampling to dynamically allocate navigation budget.
Integrating lightweight global website structure construction via partial crawling plus search-engine augmentation to identify promising navigation entry points.
Introducing an episodic memory module to store and retrieve navigation trajectories and reflections, preventing repeated exploration of unpromising URLs.
Using a reflection agent with natural language prompts to evaluate navigation quality and update bandit rewards, enabling adaptive exploration-exploitation balance.

Datasets

WebVoyager — 129 filtered QA tasks with golden answers — public benchmark
WebWalkerQA — 680 web navigation tasks (single- and multi-source QA) — public benchmark

Baselines vs proposed

AgentOccam (WebVoyager, GPT-5-mini): Success rate = 56.3% vs MANGO: 63.6%
WebWalker (WebVoyager, GPT-5-mini): Success rate = 16.3% vs MANGO: 63.6%
AgentOccam (WebWalkerQA, GPT-5-mini): Success rate = 20.3% vs MANGO: 52.5%
WebWalker (WebWalkerQA, GPT-5-mini): Success rate = 25.7% vs MANGO: 52.5%
MANGOrandom (WebVoyager, GPT-5-mini): SR = 56.6% vs MANGO: 63.6%
MANGOgoogle (WebVoyager, GPT-5-mini): SR = 59.7% vs MANGO: 63.6%
MANGOMCTS (WebVoyager, GPT-5-mini): SR = 46.5% vs MANGO: 63.6%

Limitations

Global structure construction relies on lightweight crawling and Google search, providing only partial coverage, which may miss deeply nested target pages.
Strict navigation budgets cause failures when early URL selections are suboptimal due to imperfect relevance estimation, leading to irreversible exploration costs.
Reasoning errors by LLM backbone during information extraction cause incorrect or hallucinated answers despite correct navigation.
The approach was not tested on highly dynamic or frequently changing websites where crawling and indexing may be stale.
Reflection module occasionally misclassifies partial or inadequate results as complete, prematurely terminating navigation.
Experiments are limited to two benchmarks; real-world web heterogeneity may present additional challenges.

Open questions / follow-ons

How to improve global structure construction coverage and freshness for massive, dynamic websites?
Can the bandit URL selection integrate contextual or interactive feedback beyond fixed Beta priors?
How to better handle reasoning and hallucination errors of underlying LLMs during information extraction?
Can episodic memory incorporate generalization to unseen but related URLs beyond exact retrieval?

Why it matters for bot defense

From a bot-defense perspective, MANGO highlights how multi-agent systems leveraging global site structure can efficiently navigate complex web environments under resource constraints. The usage of Thompson Sampling in URL prioritization and explicit episodic memory for redundancy reduction provides insight into adaptive exploration strategies that sophisticated bots might employ. CAPTCHA designers should consider that attackers could optimize multi-entry URL selection and adaptive navigation budgets rather than naively starting from homepages. Furthermore, episodic history use to avoid traps suggests defenses relying on "navigation traps" or dead-end pages must be coupled with mechanisms preventing knowledge reuse or adaptive reflection by bots. Understanding this global optimization approach may help in designing layered bot defenses that detect multi-agent coordination or anomalous navigation patterns leveraging structural knowledge rather than local page clicks alone.

Cite

bibtex

@article{arxiv2604_18779,
  title={ Mango: Multi-Agent Web Navigation via Global-View Optimization },
  author={ Weixi Tong and Yifeng Di and Tianyi Zhang },
  journal={arXiv preprint arXiv:2604.18779},
  year={ 2026 },
  url={https://arxiv.org/abs/2604.18779}
}

Mango: Multi-Agent Web Navigation via Global-View Optimization ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​

Mango: Multi-Agent Web Navigation via Global-View Optimization