The Accessibility Capability Boundary: Operational Limits and Expansion Potential of AI-Generated Browser-Native Accessibility Systems

Source: arXiv:2605.19638 · Published 2026-05-19 · By Rizwan Jahangir, Daisuke Ishii

TL;DR

This paper addresses the fundamental question in accessibility computing of how far AI-driven accessibility systems can extend their operational capabilities. The authors introduce the Accessibility Capability Boundary (ACB), a formal, multidimensional framework that models accessibility not as a binary compliance criterion but as a dynamic capability space constrained by measurable factors such as deployment latency, cognitive load, infrastructure dependency, offline persistence, interaction complexity, and adaptability. They argue that AI-generated, browser-native accessibility systems—delivered as single-file HTML artifacts leveraging standardized browser APIs—can shift the ACB outward by drastically reducing deployment friction and enabling rapid, context-specific interface adaptation.

The work grounds the ACB framework in two real-world exploratory prototypes: (1) an AI-generated browser-native interface for a blind user in Nepal, and (2) an open-source webcam alignment assistant designed for visually impaired users that uses client-side face detection and audio feedback for camera positioning. Through formal definitions, propositions, and a comparative evaluation matrix, the paper characterizes which regions of the accessibility capability space are reachable by AI-generated browser-native systems versus traditional assistive technologies. It identifies hard computational and infrastructural constraints that define the limits of this AI-generated accessibility paradigm. The work thereby provides a theoretical foundation for understanding the operational limits and expansion potential of autonomous accessibility systems and proposes a roadmap for future research in accessible AI interfaces.

Key findings

AI-generated browser-native accessibility systems reduce deployment latency (Ld) from 0.85 (traditional AT installation) to 0.05 (URL load) in normalized units, cutting setup time by orders of magnitude (Table 1).
Infrastructure dependency (Di) is reduced from 0.90 in traditional assistive technology to 0.10 in AI-generated browser-native systems by leveraging browser caching and offline Service Workers.
Adaptability (Ad) improves dramatically, rising from 0.10 in manually developed systems to 0.90 due to rapid LLM-driven interface regeneration.
Interaction complexity (Cx) measured by discrete steps to task completion declines from 15 for traditional AT setup to 2 steps for AI-generated browser-native interfaces.
Offline persistence (Po) reaches around 0.80 for AI-generated systems via Service Workers, slightly less than traditional local installs (0.90).
Benchmarks on webcam alignment assistant show initial load times under 1 second on low-tier smartphones (850ms), face detection latency of 110ms, and CPU utilization under 20% (Table 2), confirming feasibility on constrained devices.
Automated accessibility audits recorded 100% WCAG compliance on static HTML structure; however, manual screen reader testing revealed the need to debounce ARIA live region updates to avoid cognitive overload.
The client-side webcam alignment system demonstrates that infrastructure dependency minimization comes at a cost of slightly lower face detection accuracy compared with server-side models.

Threat model

The adversary is primarily the combination of infrastructural constraints (limited bandwidth, hardware capability, and connectivity reliability) and functional user impairments (visual, motor, cognitive, hearing). The threat model assumes no malicious adversary but focuses on operational limits posed by the user abilities, environment, and system constraints. The system is designed to minimize deployment and usage friction to expand accessibility reach, accommodating adversarially challenging contextual limitations but not active attacks.

Methodology — deep read

Threat Model & Assumptions: The adversary model is implicit: users with disabilities (visual impairment, motor limitations, etc.) attempt to interact with digital content via assistive technologies in varying environments, including low-connectivity and low-resource contexts. The system assumes a hostile environment of infrastructural constraints (limited bandwidth, device capabilities) but does not focus on adversarial attacks. The model evaluates operational bounds rather than security threats.
Data: The work uses no large labeled datasets but deploys two real-world accessibility probes: (a) an AI-generated browser-native interface for a blind user in Nepal, synthesized in one LLM generation pass with documented prompts; (b) an open-source webcam alignment assistant implementing client-side face detection using MediaPipe FaceMesh WASM and Web Speech API. Empirical evaluation uses simulated hardware profiles and automated accessibility audit tools (axe-core, WAVE, Lighthouse).
Architecture / Algorithm: The system pipeline starts with a natural language description of an accessibility need, which the LLM (Claude) synthesizes into a self-contained HTML/CSS/JavaScript artifact embedding ARIA semantics, offline-first Service Worker caching, and browser APIs like MediaDevices (camera), Web Speech (audio synthesis), Pointer Events, and Vibration APIs. The webcam alignment assistant implements a closed-loop pipeline: camera feed → WASM face landmark detection → spatial analysis → debounced audio guidance output via Web Speech API. The artifact runs entirely client-side in a browser sandbox with no external backend dependencies after initial load.
Training Regime: Not applicable—no ML training was performed. The LLM generation uses a deterministic temperature setting (T=0.2) to reduce generative variability and maintain reproducible interface outputs.
Evaluation Protocol: Evaluation combines quantitative constraint vectors for accessibility dimensions comparing traditional AT vs AI-generated probes (Table 1), performance benchmarks across simulated device profiles with CPU/network throttling (Table 2), and qualitative heuristic observations of usability and cognitive load. Accessibility verification employs automated auditing tools followed by manual screen reader testing to catch dynamically-induced semantic failures. Future work proposes a within-subjects user study protocol measuring task completion, System Usability Scale, and NASA-TLX cognitive load but this is not yet conducted.
Reproducibility: The webcam alignment probe is open sourced with full deployment instructions and generation prompts documented. The system is encapsulated in a single-file HTML artifact runnable in modern standard browsers without installation. The LLM prompt seeds and deterministic sampling settings are provided. No proprietary or closed datasets are involved.

Example End-to-End: A blind user requiring webcam alignment opens the URL for the assistant in a standards-compliant browser. The system requests camera permission, runs WASM-based face landmark detection purely client-side, analyzes face position relative to the frame, and provides real-time audio guidance synthesized locally. The entire interaction requires no installation, persists offline after first load, and adapts rapidly to user needs or localization by regenerating the interface with LLM prompts.

Technical innovations

Formulation of Accessibility Capability Boundary (ACB) as a multidimensional, formal framework modeling accessibility as a capability space rather than binary compliance.
Demonstration that AI-generated, browser-native accessibility systems reduce deployment latency and infrastructure dependency via single-file HTML artifacts leveraging standard browser APIs and Service Workers.
Closed-loop browser-native webcam alignment assistant integrating client-side WASM face detection, spatial analysis, and real-time audio feedback without external dependencies.
Accessibility verification pipeline combining automated WCAG audits with manual screen reader testing to address semantic failures unique to dynamic AI-generated interfaces.

Baselines vs proposed

Traditional Assistive Technology: Deployment Latency (Ld) = 0.85 vs AI-Generated Browser-Native: 0.05
Traditional AT: Infrastructure Dependency (Di) = 0.90 vs AI-Gen: 0.10
Traditional AT: Cognitive Load (Lc) = 0.60 vs AI-Gen: 0.25
Traditional AT: Adaptability (Ad) = 0.10 vs AI-Gen: 0.90
Traditional AT: Offline Persistence (Po) = 0.90 vs AI-Gen: 0.80
Webcam alignment assistant initial load time on mobile: 850ms vs baseline unaided approximately multiple seconds (exact baseline not measured, preliminary heuristic only)

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.19638.

Fig 4

Fig 4: The browser-native interface during active tracking. The UI relies on ARIA-labeled semantic

Limitations

Empirical evaluation is preliminary and exploratory without large-scale user studies or statistically powered human trials.
Face detection accuracy of client-side lightweight JavaScript libraries is lower than server-based deep learning models, limiting precision in spatial guidance.
The tradeoff between offline persistence and adaptability (network dependency for LLM inference) limits simultaneous maximization of all accessibility constraints.
Automated accessibility audits cannot fully capture semantic and dynamic failures inherent to AI-generated interfaces, necessitating manual screen reader validation.
Localization and assistive technology compatibility (Ac and Lz) are improved but not yet equivalent to traditional fully installed solutions.
The approach assumes availability of modern standards-compliant browsers with required APIs, which may not hold in all low-resource environments.

Open questions / follow-ons

How well do AI-generated browser-native accessibility systems perform across diverse real-world users and ability profiles in controlled human-subjects evaluations?
Can client-side computer vision accuracy approach server-grade performance without increasing infrastructure demands?
What are the tradeoffs in accessibility utility when dynamically adapting interfaces for multiple concurrent disabilities or evolving user needs?
How can automated accessibility verification be improved to reliably catch semantic-level failures in dynamically generated interfaces?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, this work offers a conceptual and practical framework to understand the operational capabilities and constraints of AI-generated, client-side accessible interfaces. The Accessibility Capability Boundary (ACB) provides measurable dimensions (latency, cognitive load, adaptivity) relevant to designing inclusive human-computer interaction flows that minimize user friction. The emphasis on browser-native, offline-first architectures suggests promising directions for deploying low-friction verification or assistance tools directly in constrained user environments, including those impacted by connectivity or resource scarcity.

However, the identified constraints around dynamic interface generation, semantic accessibility verification, and infrastructure limitations highlight challenges that align with bot-detection scenarios: maintaining robust, user-friendly interaction while avoiding overload and preserving privacy/security. Developers considering accessibility-aware bot-defenses could leverage the ACB framework to balance adaptability and usability while assessing deployment tradeoffs relevant to diverse global populations.

Cite

bibtex

@article{arxiv2605_19638,
  title={ The Accessibility Capability Boundary: Operational Limits and Expansion Potential of AI-Generated Browser-Native Accessibility Systems },
  author={ Rizwan Jahangir and Daisuke Ishii },
  journal={arXiv preprint arXiv:2605.19638},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.19638}
}

The Accessibility Capability Boundary: Operational Limits and Expansion Potential of AI-Generated Browser-Native Accessibility Systems ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​