Skip to content

SearchLog: A Web Browser Extension for Capturing Search Logs in Laboratory Studies

Source: arXiv:2606.05040 · Published 2026-06-03 · By Jiaman He, Riccardo Xia, Dana McKay, Damiano Spina, Johanne R. Trippas

TL;DR

This paper introduces SearchLog, a novel Chromium-based web browser extension designed for laboratory studies capturing naturalistic search logs during participants' open web search sessions. Unlike prior research tools that rely on custom search interfaces or limited logging features, SearchLog records a rich, structured set of browser and page-level interactions including clicks, scrolling, mouse movements, hovered text, typed inputs, search queries, result rankings, multi-tab and window activity, as well as AI-generated summaries when available. The collected data is stored locally as ordered JSON event streams accompanied by HTML snapshots and preprocessed search result data, enabling detailed post-hoc analysis of search behavior in both traditional and AI-enhanced web search scenarios.

Through a technical validation experiment on Google and Bing search engines, the authors demonstrate that SearchLog reliably captures diverse user interactions and correctly manages session boundaries while protecting sensitive inputs such as passwords. The tool supports multiple Chromium browsers, offers an easy installation and usage workflow, and fills a gap left by existing logging systems that either lack extensibility for commercial search engines or omit rich interaction features critical for modern search behavior research. SearchLog thus provides a reusable, researcher-friendly platform to study the evolving dynamics of web search and human-AI interaction under naturalistic laboratory conditions.

Key findings

  • SearchLog captures a broad range of interactions: clicks, scrolling, mouse movements, hovered text, typed text, submitted queries, result rankings, tab/window operations, and AI-generated summaries.
  • The system supports multiple Chromium browsers: Google Chrome, Microsoft Edge, Opera, Brave, enabling flexible deployment in lab studies.
  • SearchLog logs are stored locally in ordered JSON event streams with accompanying HTML snapshot files and parsed search result rankings for detailed analysis.
  • The technical validation covered six scenarios (basic search, multi-tab, mouse, keyboard, AI-summary capture, session management) and confirmed correct and complete data capture in each.
  • Sensitive inputs such as password fields are detected and masked in logs to protect privacy.
  • SearchLog currently supports two major search engines (Google and Bing) with structured extraction of AI-generated summaries such as Google AI Overviews and Bing Copilot dialogs.
  • The extension automatically timestamps and organizes event logs by participant, session, and task metadata, facilitating analysis of search behaviors such as query reformulation, tab switching, dwell time, and scroll depth.
  • SearchLog requires maintenance to handle search engine UI changes, especially for accurate extraction of rankings and AI content.

Threat model

The adversary is minimal since SearchLog is designed for tightly controlled lab studies with explicit participant consent. Data collection is local and explicitly bounded by session start/stop events. The tool does not perform continuous background monitoring and limits exposure of sensitive inputs by masking password fields. Adversaries cannot access data remotely during the session. However, search engine layout changes or browser vulnerabilities could compromise data accuracy or privacy, necessitating ongoing maintenance and ethical protocols.

Methodology — deep read

The authors' threat model assumes a controlled laboratory environment where participants knowingly engage in search tasks; the adversary model is minimal since data are collected locally and in explicitly started sessions to prevent inadvertent privacy breaches.

Data provenance comes from participants using Chromium-based browsers during lab studies accessing live, open web search engines (Google and Bing). The sample technical validation used a non-sensitive demonstration task conducted by testers, not real participants, to verify logging correctness.

The system architecture consists of a client-side browser extension that hooks into browser APIs and page DOM events, and a local Flask backend server that receives, timestamps, and stores event data. The extension captures four categories of data: mouse/keyboard/page interactions collected via injected JavaScript event listeners parsing the DOM, search-specific events parsing query, result rankings, and AI-generated summaries, and browser-level events like tab/window opens, closes, and focuses via Chrome APIs.

Events are streamed and logged immediately as JSON objects, each with metadata including session ID, timestamp, event type, action, and detailed payloads like mouse coordinates or typed characters, stored to disk to minimize data loss.

The training regime or optimization is not applicable here since this is a logging tool; instead, a rigorous evaluation protocol tested six common usage scenarios to ensure completeness and accuracy. Logs were inspected manually to verify that all relevant events were captured in temporal order, complete with safety protections for sensitive inputs.

Reproducibility is supported through public code and documentation released under Apache 2.0 license, including example logs and validation scripts. The system supports extensibility for future search engines and interfaces by adapting DOM parsing rules.

A concrete end-to-end example: a participant begins a session by the researcher starting the local server and enabling logging through the extension dialog, then performs natural web searches on Google. As they type "how solar panels work," keyboard events are logged, upon submission a search event records the query along with HTML snapshots and parsed rankings. Mouse movements, scrolling, clicks on results, tab switches, and AI summary interactions (if present) are logged until the researcher stops the session, producing a structured event stream ready for detailed behavior analysis.

Technical innovations

  • Introduction of a modular Chromium browser extension capturing fine-grained, multi-modal user interactions including mouse, keyboard, scrolling, hovered text, and tab/window activity in naturalistic web search sessions.
  • Structured logging schema integrating browser state, search queries, result rankings, and extraction of AI-generated summaries (e.g., Google AI Overviews, Bing Copilot) to capture emerging search interface features.
  • Local Flask backend design that streams and immediately persists ordered JSON event logs alongside HTML snapshots and preprocessed data for robust session-level data collection in lab experiments.
  • Cross-engine support for Google and Bing with easily extensible DOM parsing rules enabling integration of additional search engines and future AI-enhanced search content logging.

Baselines vs proposed

  • Compared to Search-Logger: SearchLog captures all events Search-Logger does plus hovered text, typed text, tab/window operations, AI summary extraction.
  • Compared to SearchPanel: SearchLog captures mouse movements, hovered text, tab/window events, and AI summaries absent in SearchPanel.
  • Compared to LogUI: SearchLog adds typed text capture and AI summary extraction beyond LogUI's clicks, scrolling, and mouse events.
  • Validation experiment verified 100% capture of key events across six scenarios, including session start/end and sensitive input masking.

Limitations

  • Currently limited to Chromium-based browsers; Firefox and Safari not supported without redesign.
  • Only supports Google and Bing search engines; other engines require custom DOM parser development.
  • Logging depends on stable search engine page layouts — requires maintenance to accommodate UI changes.
  • No large-scale user study or adversarial robustness testing reported; validation done via scripted demonstration tasks.
  • Logs may contain sensitive information requiring careful ethical handling and participant consent.
  • Does not capture eye-tracking or physiological signals; focused mainly on browser interaction events.

Open questions / follow-ons

  • How do different AI-generated summaries on search result pages influence user behavior and information evaluation strategies?
  • Can SearchLog be extended to capture interaction data from conversational AI systems like ChatGPT embedded in web search?
  • What are the privacy implications and user perceptions when logging rich browser interaction data during natural search tasks?
  • How do individual differences (e.g., expertise, cognitive style) affect multi-tab search behaviors and query reformulation patterns captured with SearchLog?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, SearchLog's detailed logging of natural search behaviors including mouse movements, scroll patterns, and tab switching can provide valuable ground truth data to model human browsing dynamics versus automated or scripted bots. The ability to capture rich interactions on AI-enhanced search pages also offers insights into how humans engage with emerging online content formats — potentially guiding the design of more sophisticated bot detection and adaptive CAPTCHA challenges tuned to subtle behavioral cues. Furthermore, researchers developing CAPTCHAs or bot defenses aimed at search environments may leverage SearchLog to validate that their solutions do not disrupt natural user behaviors or hinder typical search workflows.

However, since SearchLog focuses on laboratory settings with explicit consent and local storage, it is not a direct deployment solution for large-scale bot monitoring. Instead, it represents a research tool useful for building foundational datasets and interaction models. Practitioners should consider its limitations in scale, adversarial robustness, and browser/engine coverage when applying findings to real-world CAPTCHA system design and bot defenses.

Cite

bibtex
@article{arxiv2606_05040,
  title={ SearchLog: A Web Browser Extension for Capturing Search Logs in Laboratory Studies },
  author={ Jiaman He and Riccardo Xia and Dana McKay and Damiano Spina and Johanne R. Trippas },
  journal={arXiv preprint arXiv:2606.05040},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.05040}
}

Read the full paper

Last updated:

Articles are CC BY 4.0 — feel free to quote with attribution