Cache to the Future: A Distributed Webpage Archive for Internet Blackouts

Source: arXiv:2606.17245 · Published 2026-06-15 · By Ross Evans, Diogo Barradas

TL;DR

Cache to the Future (CttF) addresses the challenge of maintaining access to web-hosted knowledge during internet blackouts caused by infrastructure failures or state censorship. Prior blackout-resistant technologies mainly enable limited messaging over mobile mesh networks but cannot reliably deliver rich static web content such as reference pages, images, or scripts. CttF introduces a distributed caching system where users pre-cache web pages with cryptographic signatures via trusted proxies and exchange crowd-sourced community ratings over Bluetooth to prioritize replication. During blackouts, cached pages are requested and served locally from nearby devices. The system resists adversarial interference such as Sybil manipulation and jamming through proof-of-work rating exchanges and opportunistic data sharing. Evaluation through large-scale city-wide simulations using real mobility data demonstrates that CttF can serve 75% of requests for 100,000 pages over a two-month blackout with median latency under 24 hours for the top 10,000 pages. Even with 25% Sybil adversaries, request satisfaction only drops by 5% over one week, and an adversary must jam over 2,500 km2 to significantly disrupt service. CttF thus offers a practical, scalable approach for blackout-resilient access to significant web knowledge bases.

Key findings

In a benign scenario, CttF achieves 75% request satisfaction for 100,000 pages over a 2-month blackout.
Median latency for the 10,000 most popular pages is under 24 hours during blackout retrieval.
With 25% Sybil adversaries, request satisfaction decreases by only 5% over a week-long blackout.
Adversaries must jam 2,500 km2 of the 5,000 km2 simulation area to cause significant disruption.
Manual seeder users cache 1,200 pages each, corresponding to about 3 GB of storage per device.
Page ratings transformed via Zipf-like distribution prioritize caching of both popular and obscure content.
Proof-of-work limits influence of adversarial Sybil nodes during rating exchange, making rapid fake ratings costly.
Epidemic routing forwarding limited due to session size caps, while CttF’s community caching improves satisfaction and latency.

Threat model

The adversary is a state-level censor capable of disabling internet connectivity within a city, controlling some fraction of devices as Sybil nodes that can submit false ratings, drop, or manipulate cached content, and selectively jam radio communications locally. The adversary cannot globally monitor or block all Bluetooth communications due to short-range constraints, and individual Sybil devices have computational powers comparable to ordinary user devices. The adversary stops short of large-scale arrests or isolating users individually.

Methodology — deep read

The authors model a state-level adversary capable of inducing internet blackouts by disabling internet infrastructure and deploying Sybil nodes, jamming Bluetooth communications locally, and manipulating cached content or page ratings. The adversary lacks global network monitoring or control of all devices, and individual Sybils have similar computational power to normal users. CttF is evaluated using the YJMob100K dataset, which records real mobility traces for 25,000 individuals in a dense Japanese city over 75 days with 30-minute sampling intervals. This city-scale data captures user movements across 40,000 grid cells of 500m each. The device population includes seeders (who manually cache and rate pages), leechers (who only background cache based on ratings), and adversaries (who inject malicious ratings and perform jamming). The simulation spans a realistic two-month blackout, modeling pre-blackout rating and caching phases, followed by users requesting pages via Bluetooth peer-to-peer exchanges. Ratings are locally averaged per page with a maximum of 1,000 ratings exchanged per session. To limit Sybil rating spam, a proof-of-work challenge requiring finding partial hash collisions is imposed on rating submissions; difficulty parameters are tuned and explored. Page caching prioritizes webpages via a Zipf-transformed rating distribution to reflect realistic access skew. Page retrieval exchanges pages signed by a trusted proxy with public key verification to ensure authenticity. Adversarial behavior includes rating manipulation, dropping requests, and jamming the densest populated grid cells. Comparative evaluations contrast CttF against epidemic routing approaches with limited forwarding buffers and no rating exchange. Evaluation metrics include request satisfaction ratio, latency to retrieve pages (median times), caching distribution across pages by popularity, and robustness of ratings against manipulation. Micro-benchmarks on an Android prototype measure proof-of-work computational costs and Bluetooth throughput. Simulations run on a powerful multi-CPU server to efficiently model large-scale interactions over months of user mobility and page requests.

Technical innovations

A distributed webpage caching and delivery system designed specifically for blackout scenarios using local Bluetooth exchanges.
Integration of community-sourced page ratings exchanged via a proof-of-work protocol to guide adaptive caching while limiting Sybil manipulation.
Use of trusted proxies to cryptographically sign cached webpages pre-blackout enabling authenticity checks during disrupted network conditions.
A page caching prioritization scheme transforming user ratings through a Zipf-like distribution to balance popular and niche content replication.
Evaluation using city-wide, real-world high-resolution mobility traces over extended blackout durations for realistic performance modeling.

Datasets

YJMob100K — 25,000 individuals, 75 days of GPS trajectories sampled every 30 minutes — public mobility dataset from a Japanese city

Baselines vs proposed

Epidemic routing baseline: request satisfaction under 40% at 1-week blackout vs CttF achieving over 95% satisfaction for top pages
Latencies with epidemic routing exceeded multiple days for popular pages vs CttF achieving median under 24 hours
Adversarial scenarios with 25% Sybil nodes: epidemic routing request satisfaction dropped over 50% vs CttF only 5%
Battery consumption for proof-of-work ranged from 11 mAh (1s PoW) to 390 mAh (1 min PoW) per rating exchange

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2606.17245.

Fig 1

Fig 1: CttF’s workflow. Users cache, rate, and exchange page ratings pre-blackout,

Fig 2

Fig 2 (page 6).

Fig 3

Fig 3 (page 6).

Fig 4

Fig 4 (page 6).

Limitations

The system assumes users can pre-cache pages before a blackout; unexpected blackouts without preparation may limit effectiveness.
Page authenticity relies on trusted proxies reachable pre-blackout; proxy discovery and circumvention are out of scope and may limit practical deployment.
CttF does not attempt anonymity or identity protection, and Bluetooth communication leaks MAC addresses, which could be exploited by adversaries.
Simulations assume 2.5 MB median page size and 3 GB storage availability per device, which may be infeasible in some contexts or devices.
Evaluation lacks deployment in real-world harsh environments with fully adversarial user populations or large scale jamming tests.
No detailed adversarial modeling of content poisoning beyond page rating manipulation and jamming was explored.

Open questions / follow-ons

How might CttF perform in sudden blackout scenarios without adequate pre-caching opportunity?
What are scalable and censorship-resistant approaches to proxy discovery and trust bootstrapping in hostile regimes?
Can stronger privacy and anonymity guarantees be integrated within CttF’s design without sacrificing usability or performance?
What are the trade-offs of extending CttF to support dynamic content or authenticated user sessions during blackouts?

Why it matters for bot defense

From a bot-defense or CAPTCHA engineering perspective, Cache to the Future presents an interesting application of distributed caching and peer-to-peer data delivery over intermittent short-range connectivity in adversarial settings. Its approach to mitigating adversarial manipulation of community ratings through cryptographic proof-of-work challenges echoes partial analogies to CAPTCHA’s goal of hindering automated spam or Sybil attacks. The system’s reliance on crowd-sourced trust without centralized identity highlights challenges relevant to bot mitigation where trust establishment is limited. Additionally, CttF demonstrates the importance of redundancy and verification (via signature verification) in disseminating authentic content, which parallels needs in CAPTCHA-resistant content delivery. While not a CAPTCHA system itself, the design offers insights into securing distributed reputation or rating mechanisms against manipulation and operating under censorship constraints — both key challenges in bot-defense. Engineers tackling CAPTCHA for blackout-resistant or censorship circumvention tools might consider similar layered cryptographic and crowd-sourced validation strategies to prevent automated attacks degrading service availability or content integrity.

Cite

bibtex

@article{arxiv2606_17245,
  title={ Cache to the Future: A Distributed Webpage Archive for Internet Blackouts },
  author={ Ross Evans and Diogo Barradas },
  journal={arXiv preprint arXiv:2606.17245},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.17245}
}

Cache to the Future: A Distributed Webpage Archive for Internet Blackouts ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​