The Knowledge Gap in a High-Choice Media Environment: Experimental Evidence from Online Search

Source: arXiv:2605.21019 · Published 2026-05-20 · By Roberto Ulloa, Tiedemann Leonard, Peter Selb, Celina Kacperski

TL;DR

This paper investigates persistent political knowledge inequalities in the context of a high-choice, algorithmically mediated digital environment by experimentally studying self-directed online search behavior. The authors focus on how individuals with varying education levels acquire policy-specific knowledge on three German policy topics differing in divisiveness and complexity (child support, energy transition, cannabis legalization). Using a field experiment with randomized encouragements (verbal, financial, control) and passive browser tracking over a naturalistic 20-hour search window, the study uncovers how motivation affects search behavior and how education and baseline civic knowledge moderate learning outcomes. The interventions successfully equalized the frequency of participants engaging in information search but the resulting knowledge gains were disproportionately concentrated among higher-education or higher-civic-knowledge individuals, supporting the knowledge gap hypothesis in digital contexts. The authors argue that narrowing political knowledge gaps requires not only motivating search but also structural reforms and individual-level skill development to improve equitable learning in complex online environments.

The study uses a novel combined methodological approach that links experimentally controlled motivational variation with detailed web-tracing data and post-search knowledge assessment, thus overcoming limitations of prior experimental and observational research. Results show that while encouragements increased search participation overall, gains in knowledge were most pronounced for those better prepared to navigate search results effectively. The paper highlights a subtle dynamic whereby motivation alone is insufficient to close political knowledge inequalities in high-choice media environments; instead, differences in skills and abilities to process, select, and integrate information remain key gaps. These findings have important implications for bot-defense, CAPTCHA, and media literacy design targeting equitable access to trustworthy and legally relevant information.

Key findings

Randomized verbal and financial encouragements increased the likelihood of participants engaging in online search equally across education levels (RQ3).
Despite equalized search behavior, knowledge gains were significantly larger for participants with higher educational attainment and higher baseline civic knowledge (H1b, H2b).
Local average treatment effect (LATE) estimates show that actual information search positively impacted post-search policy knowledge scores, but the gains were moderated by education level, supporting the knowledge gap hypothesis.
Participants with higher education or civic knowledge demonstrated more effective navigation of search results, inferred from post-hoc analyses examining browsing behavior.
Of 871 tracked participants, ~1886 policy-related website visits were recorded during the 20-hour search windows, with 930 identified as search query visits.
Knowledge tests consisted of 5 items per policy area, selected for difficulty via Rasch modeling to reduce ceiling effects.
No statistically significant reduction in education-based knowledge disparities was observed despite motivational interventions and increased search activity.
The experimental design achieved >98.5% power to detect small effects (Cohen’s f = 0.15) for ITT analyses across waves.

Threat model

N/A. The study does not model an active adversary but examines how pre-existing disparities in education and civic knowledge moderate the efficacy of self-directed online political learning under naturalistic, algorithmically curated search conditions. No direct evaluation of malicious attacks such as bots, misinformation campaigns, or adversarial manipulation was performed.

Methodology — deep read

Threat Model & Assumptions: The adversary is abstracted as educational and civic knowledge disparities among democratic citizens attempting to acquire accurate policy-specific information through online search. The study assumes participants have varying cognitive resources, prior knowledge, and search skills but face the same real-world algorithmic search environment. The impact of selective exposure, motivation, and ability to navigate search results is central. There is no evaluation of malicious actors or intentional misinformation spreaders, so direct adversarial threat modeling is not applied.
Data: The sample consists of German adults recruited from a commercial web-tracking panel (Bilendi GmbH) compliant with GDPR, resulting in 1,216 participants across three policy-topic waves in 2023. Each participant was assigned to one of three conditions (control, verbal encouragement, financial incentive). Passive browser tracking captured ~452,000 URL visits (~267,000 unique URLs) over 20-hour treatment windows, of which 1,886 visits were policy-topic relevant. Policy knowledge was assessed post-search using 5 difficult multiple-choice items per topic, chosen through pretests and Rasch analysis to avoid ceiling effects. Baseline civic knowledge was measured previously using validated political knowledge items.
Architecture / Algorithm: The paradigm integrates a field experimental design with a two-stage analysis framework. ITT effects of encouragement assignment on knowledge and search behavior are estimated via OLS regression with interaction terms for education and civic knowledge. LATE effects of actual search participation are estimated using instrumental variables regression (two-stage least squares), where encouragement assignment is the instrument for search behavior. Browser URLs were manually annotated for topical relevance with verification from a BERT-large classifier (F1 > 0.97) to ensure accurate identification of policy-related browsing.
Training Regime: N/A for model training, but the experiment was organized in three waves, each on a different policy topic, conducted in sequence over a one-month window in 2023. Each participant was given 20 hours to freely search the internet after encouragement assignment. Knowledge tests followed immediately after. Participant characteristics (age, gender, education, civic knowledge) were balanced across conditions, with random assignment seeded by the survey platform. Power calculations indicated >98.5% power for detecting small effects.
Evaluation Protocol: The main outcomes are post-search policy knowledge scores, with covariates including educational level (low vs. high) and baseline civic knowledge. ITT models assess the effect of encouragement assignment. LATE analysis isolates the effect of actual search induced by encouragement, accounting for possible unobserved confounds under instrumental variable assumptions. Search behavior metrics include number of visits, time spent, and search queries. Analyses were conducted separately by policy topic due to non-comparability of test items. Multiple interaction models examined moderating effects. No formal adversarial robustness testing or distribution shift evaluation was performed. No cross-validation applied as this is a field experiment.
Reproducibility: The authors publicly release the analysis code and anonymized data at OSF (https://osf.io/pv8ey/), enabling reproducibility. The web tracking dataset is commercial but accessible. Exact knowledge test items, code for URL annotation, and analytic scripts are documented. Randomization procedure and experimental workflows are described in detail, supporting replication in similar multi-wave online field experiments.

Example end-to-end flow: A participant recruited on the Bilendi panel completes baseline political knowledge measures and demographic survey. In a randomly assigned wave (e.g., cannabis legalization), they receive either no prompt, a verbal prompt, or a monetary prompt to search online for policy information for up to 20 hours. Their browser activity is passively tracked, URLs visited are annotated for topical relevance, and search queries identified. After 20 hours, they complete a post-test including 5 policy knowledge items. Researchers then estimate the effect of encouragement on whether the participant searched, and the effect of search on knowledge, moderated by education level, using two-stage regression approaches.

Technical innovations

Combining randomized motivational encouragements with passive browser tracking data to causally link self-directed search to knowledge acquisition.
Use of a high-recall BERT-large classifier to accurately annotate policy-related URLs from dense browser logs, validated against manual coding.
Estimating Local Average Treatment Effects (LATE) of intentional information search on learning outcomes, leveraging encouragement assignment as an instrument.
Integration of multi-topic, policy-specific, Rasch-calibrated knowledge tests with real-world, naturalistic internet search behavior over a realistic 20-hour window.

Datasets

Bilendi GmbH commercial web-tracking panel — ~1,216 participants (871 tracked) — proprietary but accessible via OSF
Civic knowledge baseline items — 11 questions validated for German politics — publicly described in Moosdorf et al. (2020)
Policy knowledge tests — 5 items per policy topic (child support, energy transition, cannabis legalization) — designed and validated in study

Baselines vs proposed

Control group ITT knowledge score: baseline mean (not explicitly stated in numbers) vs Verbal encouragement: modest but statistically significant increase in search participation, no significant knowledge score increase for low education participants.
Financial encouragement vs Control: stronger increase in search participation, but greater knowledge gains again limited to high education participants.
LATE effect of actual search on knowledge (averaged across policies): significant positive effect for high education (effect size unclear), smaller or no significant effect for low education participants.
Search effort metrics (e.g., number of page visits): higher for high education participants, indicating differential effective navigation despite similar search frequency.

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.21019.

Fig 1

Fig 1: Conceptual framework of political learning outcomes.

Fig 2

Fig 2: Policy topic selection.

Fig 3

Fig 3: Interaction Effects on Knowledge Scores and Information Search.

Limitations

Non-experimental structural factors like algorithmic ranking and external media coverage were uncontrolled and may influence learning independently.
Measurement of knowledge gains relied on a small set of five policy-specific questions per topic, limiting granularity.
The 20-hour search window may not capture longer-term learning or retention effects.
The study focused on a German political context during a non-election period, limiting generalizability to other countries or electoral cycles.
No adversarial tests or examination of misinformation exposure were included, which could affect knowledge inequalities.
Digital trace data capture only online search behavior, missing offline or interpersonal information acquisition channels.

Open questions / follow-ons

How can digital platforms and search engines structurally adapt to surface more equitable, accessible political information to reduce knowledge gaps?
Can targeted interventions to improve individual search/navigation skills mitigate the efficacy gap observed between education levels?
What role do misinformation or distrust in sources play in widening or narrowing knowledge gaps in self-directed search?
How do these findings generalize across different political cultures, electoral cycles, and media ecosystems?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, this study highlights that equitable access to politically relevant information depends not only on removing access barriers or increasing motivation but also on users' ability to effectively navigate complex search environments shaped by algorithmic curation. Designing CAPTCHAs or bot defenses that preserve genuine user engagement without disrupting informational agency is important. Further, encouraging meaningful human agency in information selection and preventing automated scraping or manipulation are key to maintaining trustworthiness and fairness in digital political knowledge acquisition. Insights into differential search behavior may help design adaptive challenges that consider user proficiency while restricting automated, low-effort bots that could distort information ecosystems.

Cite

bibtex

@article{arxiv2605_21019,
  title={ The Knowledge Gap in a High-Choice Media Environment: Experimental Evidence from Online Search },
  author={ Roberto Ulloa and Tiedemann Leonard and Peter Selb and Celina Kacperski },
  journal={arXiv preprint arXiv:2605.21019},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.21019}
}

The Knowledge Gap in a High-Choice Media Environment: Experimental Evidence from Online Search ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​