Not All Roads Lead to Rome: How VPN Selection Alters What We Measure and Infer about Web Infrastructure

Source: arXiv:2605.30692 · Published 2026-05-29 · By Sachin Kumar Singh, Robert Ricci, Alexander Gamero-Garrido

TL;DR

This paper addresses a critical assumption in web measurement studies that commercial VPN providers can be treated as interchangeable vantage points within the same source country. The authors demonstrate that this assumption is false: measurements made through different VPN providers from the same country produce materially different results about endpoint IP addresses, the countries endpoints reside in, and which organizations host them. They use VPN-Scope, a scalable measurement framework orchestrating synchronized web crawls across four major VPNs and 14 countries, to collect over half a million website snapshots with detailed HTTP Archive traces. Complementary DNS resolution and replica-probing experiments help isolate the sources of variability in three network layers: vantage identity, DNS resolution, and CDN replica selection. They find that most variability arises in layers below the client, including VPNs intercepting DNS queries with distinct resolvers, CDN anycast routing steering traffic differently by VPN exit network, and interconnection paths routing requests to different physical replicas. These differences persist after multiple repeated runs and vary significantly by both VPN provider and country. While divergences diminish when aggregating IP addresses to hosting organizations, the findings reveal that VPN choice strongly biases infrastructure-level inferences about geolocation and endpoint placement. The authors recommend detailed reporting of VPN modality in measurement studies and caution against assuming VPN neutrality for data localization, CDN reachability, or hosting dependency analyses.

Key findings

The cross-VPN exact-match rate for DNS A-record sets is 88.0% on initial domain queries and only 69.0% on web dependency domains, showing substantial IP-level divergence induced by VPN choice.
Nearly one in eight VPN-pair DNS queries disagreed on IP sets for initial domains, and nearly one in three for dependency domains, with disagreements typically being disjoint IP sets.
Browser-driven measurements reveal that endpoint country distributions vary significantly across VPNs from the same source country, with some VPNs routing over 60% of traffic locally while others route >50% to different countries (e.g., Chile: NordVPN routes 64.2% locally vs ExpressVPN 57.1% to Brazil).
Variability is significantly reduced when aggregating IP addresses to hosting organizations, preserving inferences about third-party hosting providers despite VPN differences.
VPN providers operate distinct in-country DNS resolvers that intercept queries regardless of client setting, accounting for majority of DNS-layer variability.
CDN anycast and replica selection steer identical queries differently depending on VPN exit networks, further diversifying endpoint sets.
Peering and routing paths dictate how DNS answers reach different physical facilities, introducing additional variability in endpoint location.
These VPN-induced effects are persistent, statistically detectable, and vary by provider-country combinations, limiting generalizability of measurements across VPNs.

Threat model

The paper assumes measurement researchers as the benign observers and focuses on the passive influence of commercial VPNs as opaque infrastructural intermediaries rather than malicious adversaries. The adversary in this context is effectively the VPN provider whose operational choices (DNS resolvers, network paths, peering) can silently bias or alter measurement outcomes. There are no capabilities assumed for an active attacker manipulating data beyond the normal provider infrastructure. The study assesses how these opaque intermediaries can distort inferences about web infrastructure even when location is fixed.

Methodology — deep read

The authors developed VPN-Scope, a measurement framework that instantiates each VPN vantage point inside isolated Linux network namespaces to prevent interference between concurrent VPN sessions. This allows controlled, repeatable, scalable measurements across multiple commercial VPN providers connecting to the same source country.

They selected four popular VPN providers (NordVPN, Surfshark, ExpressVPN, ProtonVPN) based on prior use and popularity, ensuring manual configuration via OpenVPN over UDP across identical hardware. They validated VPN exit server IP locations using IPinfo geolocation cross-checked against RIPE Atlas latency probes to confirm physical proximity within 200 km, discarding sessions where validation failed to avoid misrepresented locations.

Measurements covered 14 geographically diverse countries, balancing regional representation under practical limits of simultaneous VPN connections (max 10). For each country, they crawled 100 government domains, 100 regional popular domains, and 800 globally popular sites sampled from the Tranco top 1 million list with stratified sampling by popularity tiers.

They performed 5 repeated runs over 3 weeks, each run consisting of 25 measurements per website per VPN, yielding initially over 1.2 million snapshots filtered down to 510K valid snapshots after removing non-connected, mislocalized, error, zero-request, and domain mismatch samples.

The core browser-driven measurement launched fresh Selenium-controlled Chrome instances per VPN namespace for each site, recording HTTP Archive (HAR) traces containing all request/response data, including endpoint IPs.

Complementary controlled DNS experiments issued A-record lookups for 25K domains per vantage point over the VPN-configured resolver to isolate DNS layer behavior. Additionally, probes identified actual routing of DNS queries inside the VPN tunnel and which physical CDN replicas served traffic for identical anycast IP addresses.

Comparisons across VPNs from the same country examined IP set overlaps, geolocated endpoint countries, and organization-level hosting by mapping AS numbers to organizations. Statistical analyses measured exact set matches, Jaccard similarity, and significance across provider-country pairs. They tracked individual instances end-to-end: from initial DNS resolution inside the VPN namespace, to HTTP page loads, to replica identification.

All VPN namespaces were instantiated fresh for each experiment to avoid state carryover. Experiments were synchronized across VPNs for matched timing. The data collection rig used CloudLab physical machines, assigning one machine per country.

Strict data validation and filtering ensured minimal confounding from VPN misconfiguration or temporal effects. The authors highlight remaining uncertainty around obfuscated VPN DNS paths and CDN anycast routing mechanisms due to provider opacity.

VPN-Scope code and sanitized datasets will be released upon publication, facilitating reproducibility.

Technical innovations

VPN-Scope: A namespace-based framework creating fully isolated, parallel VPN vantage points on a single host for synchronized, controlled web measurements.
Systematic empirical characterization of VPN-induced variability in DNS resolution, endpoint geolocation, and replica selection across multiple providers and countries.
Combination of large-scale browser-driven crawling with targeted DNS A-record experiments and anycast replica probing to isolate network-layer sources of measurement divergence.
Use of validated, multi-dimensional filtering combining geolocation, latency verification, and content integrity checks for robust large-scale VPN measurement.
Quantitative analysis demonstrating persistent, statistically significant VPN-provider and country-specific interactions in infrastructure inferences.

Datasets

VPN-Scope dataset — 510,000 valid web snapshots — collected via synchronized browser crawls over 4 VPN providers and 14 countries
DNS A-record dataset — 50,000 domains probed via VPN-configured resolvers — internal to study

Baselines vs proposed

Cross-VPN DNS exact match rate on initial domains: 88.0% vs dependency domains: 69.0%
VPN endpoint country agreement varies by country and provider up to ±40 percentage points (e.g., Chile: Surfshark 64.2% local vs ExpressVPN 57.1% Brazil)
Jaccard similarity thresholding showed near-binary agreement/disagreement of DNS IP sets without partial overlaps
IP-to-organization mapping reduces inter-VPN variability significantly, preserving hosting provider inferences (numerical deltas not precisely reported)

Limitations

Study includes only 4 commercial VPN providers; results may not generalize to all VPNs or residential proxies.
Geolocation relies on commercial IPinfo data and latency probes which may have residual inaccuracies or coarse granularity.
Controlled DNS experiments do not capture all DNS manipulations or caching behaviors inside VPN providers.
CDN and peering dynamics inferred indirectly; VPN providers’ internal DNS and routing policies remain opaque.
No adversarial evaluation or simulation of active evasion—focus is on measurement variability under normal conditions.
Filtering and validation excluded certain data, leaving potential selection bias especially where VPNs could not be validated.

Open questions / follow-ons

How do other vantage types, like residential proxies or cloud nodes, compare to VPNs in inducing measurement variability?
Can active probing or traceroute-style techniques help further isolate where inside the VPN and ISP paths variability is introduced?
What methods can future measurement studies use to mitigate or normalize VPN-induced biases systematically?
How do VPN-induced effects evolve over time with changes in provider infrastructure and CDN routing policies?

Why it matters for bot defense

Bot-defense and CAPTCHA engineers often rely on geographically distributed vantage points, including commercial VPNs, to measure access patterns, measure global reachability, or test regional blocking. This paper cautions that VPN provider choice materially alters the observed web infrastructure endpoints and their location, which can bias such measurements and any bot-defense decisions relying on them. For instance, estimating endpoint geolocation or CDN replica availability from different VPNs in the same country can yield conflicting conclusions, potentially affecting fraud detection, latency verification, or geo-fencing logic. Engineers should therefore document VPN providers explicitly used for web measurements, avoid assuming VPN neutrality, and consider multiple providers or complementary vantage types to robustify insights. Understanding which measurement layer (DNS, routing, replica selection) drives variability also suggests where instrumentation or normalization can reduce VPN-induced uncertainty in bot-defense system evaluations.

Cite

bibtex

@article{arxiv2605_30692,
  title={ Not All Roads Lead to Rome: How VPN Selection Alters What We Measure and Infer about Web Infrastructure },
  author={ Sachin Kumar Singh and Robert Ricci and Alexander Gamero-Garrido },
  journal={arXiv preprint arXiv:2605.30692},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.30692}
}

Not All Roads Lead to Rome: How VPN Selection Alters What We Measure and Infer about Web Infrastructure ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​