Quieting the Cobwebs: Browser Interaction for Visual Floaters

Source: arXiv:2605.12739 · Published 2026-05-12 · By Kenneth Ge, Jinglin Li, Shikhar Ahuja

TL;DR

This work addresses the challenge posed by visual floaters—semi-translucent cobweb-like shadows in the eye that move and disrupt vision for roughly one third of the population—particularly in the context of screen use where they degrade contrast, add clutter, and introduce motion distractions. The paper introduces a novel physiologically-inspired simulation of floaters incorporating neural adaptation and fluid dynamics to mimic real eye movement effects. This simulation enables quantitative evaluation of how floater motion impacts text readability across fonts and layouts using an OCR-based metric pipeline. Building on insights from this analysis, the authors develop a Chrome extension that minimizes eye movements during browsing via Rapid Serial Visual Presentation (RSVP) for text and a ‘‘world-in-hand’’ pan mode for UI elements, exploiting neural adaptation to maximize signal-to-noise ratio and reduce visual disruption from floaters. This extension works generically across websites and UI components without code modification. Results show slower floater motions degrade OCR recognition less than fast ones, single-column text layout improves readability, and font choice has a modest effect. The tool’s keyboard-driven design allows seamless mode switching and speed control, reducing saccades, which traditionally reset neural adaptation and worsen floater visibility. Overall, this is the first combined simulation, quantitative analysis, and assistive interaction design targeting floater-related vision impairment for screen users.

Key findings

Approximately 33% of smartphone users worldwide are affected by visual floaters.
Floater motion harms screen reading by reducing contrast and adding distracting clutter, especially due to the dynamic motion resetting neural adaptation.
The proposed simulation incorporates two-phased motion—initial drift (~3s) followed by settling (~9s)—and models fading under low velocity to simulate neural adaptation.
OCR readability metrics show slow-moving floaters yield significantly lower word error rate (WER 0.8085) and character error rate (CER 0.7591) compared to fast-moving floaters (WER 0.8707, CER 0.7734), p=0.0002.
Gill Sans and Tahoma fonts produced the best OCR results under floater occlusion (WER ~0.88), while other sans-serif fonts clustered higher (up to 0.904).
Single column text layout resulted in the lowest OCR word error rate (0.895), outperforming wide spaced (0.915), two-column (0.99), and narrow single column layouts (0.985).
The RSVP reading interface with ORP (Optimal Recognition Point) anchoring and a pan mode for UI elements reduces saccadic eye movements, respecting the neural adaptation principle to improve effective visual clarity.
The pan mode uses pointer-lock API and translate3d transforms for smooth 60fps panning independent of native scrolling quirks.

Threat model

Adversary is the physiological visual phenomenon of vitreous floaters causing translucent, moving shadows that disrupt vision by introducing motion noise and occlusion, particularly after each eye movement resets neural adaptation. The adversary cannot be directly removed non-invasively or corrected through conventional vision aids; instead, the system seeks to assist users by minimizing saccadic eye movements to reduce floater-induced disruption.

Methodology — deep read

The authors proceed methodically, starting with a threat model focused on individuals with visual floaters who experience dynamic shadows degrading vision during screen use. The key adversarial challenge is the floater motion causing repeated eye movement resets of neural adaptation, increasing clutter and reducing reading contrast.

Data for simulation is grounded in literature on vitreous opacity optics, fluid dynamics of vitreous humor, and neural adaptation timing (approximately 80ms), combined with coauthor self-observation experiments under controlled brightness (500 nits). The simulation domain is a 3:4 aspect ratio white canvas representing the visual field, with multiple cobweb-like floaters modeled as chains of dark joints exhibiting elasticity and deformability.

Floater motion is simulated in two phases: an initial randomized vector matching eye movement drift (~3 seconds), followed by a settling downward phase (~9 seconds), with velocity damping based on non-Newtonian fluid dynamics using XPBD (Extended Position Based Dynamics). Neural adaptation is modeled as floater opacity fading when floater velocity falls below threshold, enabling visual disappearance during fixation.

A computational readability pipeline overlays a time-averaged floater simulation on textual content with varying fonts and layouts. OCR engines estimate character and word error rates as a proxy for reading clarity under each condition. Font sets include six common sans-serif fonts, and layouts include single, wide spaced, two-column, and narrow columns.

Based on these insights, a web extension is built using Chrome Manifest V3, injecting content scripts and UI components without modifying original webpages. RSVP reading is implemented with a keyboard-triggered modal that sequentially displays words at a fixed center with Optimal Recognition Point highlighting to minimize horizontal flicker. Pan mode offers mouse-controlled smooth panning and zoom using Pointer Lock API and translate3d transformations, circumventing native scroll limitations.

Keyboard keybindings (Q, W, S, D, Space, Esc) cluster on the left side for one-handed usage, double-tap activation reduces accidental mode entry, and adaptive speed controls let users tailor pacing to their comprehension needs.

Evaluation leverages statistical comparison of OCR error rates under slow vs fast floater conditions (p=0.0002), font, and layout variations. The tool is validated qualitatively by interface smoothness and experimental observations. The source code and extension availability are mentioned but reproducibility details such as open dataset sharing are not specified. No user study results are yet reported but planned.

Overall, the empirical workflow progresses from biologically-inspired physical simulation, through computational vision metrics, culminating in practical assistive browser interaction design grounded in the cognitive neuroscience of neural adaptation.

Technical innovations

A novel 2D simulation of eye floaters integrating non-Newtonian vitreous fluid dynamics, two-phase motion, shape deformation, and neural adaptation fading.
A computational readability pipeline using OCR metrics on time-averaged simulated floater overlays to quantitatively assess text readability under dynamic occlusion and motion.
A browser extension combining Rapid Serial Visual Presentation (RSVP) with Optimal Recognition Point anchoring and a world-in-hand pan mode to minimize eye saccades across all UI elements, not just text.
Use of Pointer Lock API and translate3d transforms for smooth, consistent pan and zoom navigation that bypasses native and virtualization scroll incompatibilities.
A keyboard-first, muscle-memory-based control scheme with double-tap activation designed specifically to reduce friction and eye movement in visually impaired web navigation.

Datasets

Simulated floater overlays — synthetic videos generated via physics-based model — no external dataset used or required

Baselines vs proposed

Font choice baseline: Gill Sans WER = 0.877 vs Avenir WER = 0.904
Layout baseline: Single column WER = 0.895 vs Two columns WER = 0.990
Floater motion baseline: Slow floater OCR confidence = 0.7751, CER = 0.7591, WER = 0.8085 vs Fast floater OCR confidence = 0.6602, CER = 0.7734, WER = 0.8707 (p=0.0002)

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.12739.

Fig 1

Fig 1: Simulation output (left) and resulting clarity plot (right).

Fig 2

Fig 2: Fast floaters (left), compared to slow floaters (right), on randomly generated text

Fig 3

Fig 3: Extension interface, with the RSVP controls in a dropdown menu (left), RSVP display

Fig 4

Fig 4 (page 5).

Limitations

Simulation abstracts macro-level floater behavior and neglects finer floater shape variations seen in human-drawn samples; realism tradeoffs exist.
Evaluation relies on OCR metrics as proxy for human reading clarity rather than direct user studies with floater-affected participants.
No adversarial or robustness testing under varying lighting conditions, screen types, or device classes was reported.
The tool’s usability and efficacy have not been validated yet in large-scale or longitudinal studies, pending future work.
Neural adaptation modeling is based on fixed thresholds and simple fading; individual variability and other visual phenomena may not be captured fully.
Limited public access to the extension code and simulations may hinder reproducibility or external validation.

Open questions / follow-ons

How well does reduced saccade interaction translate to improved reading speed and comprehension in actual users with floaters?
Can the simulation model be extended to 3D or incorporate individual-specific floater shape/position data?
How might other assistive technologies such as contrast enhancement or color manipulation synergize with eye movement minimization?
What are the long-term adoption patterns and ergonomic implications of keyboard-driven interaction for floater-affected users?

Why it matters for bot defense

While this paper is not a bot-defense or CAPTCHA paper per se, it introduces a novel human-computer interaction paradigm aimed at users with specific visual impairments caused by floaters. From a bot-defense perspective, this work highlights the importance of considering diverse human visual and cognitive abilities when designing interaction challenges or accessibility accommodations. Techniques that reduce required eye or cursor movements could influence how users engage with CAPTCHA mechanisms or other challenge-response tests on the web, particularly for those with obstructed or impaired vision. Additionally, the paper’s approach to simulating perceptual distortions and assessing readability under occlusions may inspire analogous evaluation methodologies in CAPTCHA design to assess human accessibility under noisy or adversarial visual conditions.

Cite

bibtex

@article{arxiv2605_12739,
  title={ Quieting the Cobwebs: Browser Interaction for Visual Floaters },
  author={ Kenneth Ge and Jinglin Li and Shikhar Ahuja },
  journal={arXiv preprint arXiv:2605.12739},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.12739}
}

Quieting the Cobwebs: Browser Interaction for Visual Floaters ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​