Evaluation of Alternative-Based Information Systems for Deliberative Polling using an Agentic Simulator

Source: arXiv:2606.11692 · Published 2026-06-10 · By Rwaida Alssadi, Khulud Alawaji, Balaji Kasula, Muntaser Syed, Badria Alfurhood, Markus Zanker et al.

TL;DR

This paper addresses the challenge of ensuring representative exposure to the full range of arguments in large-scale deliberative polling, known as the coverage problem. To evaluate candidate solutions, the authors introduce the Agentic Bipolar Argumentation Simulator (ABAS), a novel simulation platform grounded in a formal model (BAPDF) that represents endorsing and opposing justifications with attack and enhance relations and weighted voters and relations. ABAS simulates thousands of autonomous voter agents with latent opinions who iteratively vote, select or author justifications, create argumentation links, and receive ranked recommendations based on endorsement mass. By measuring the fraction of distinct argument reason tags covered in the recommendations shown to each voter, ABAS quantifies coverage as a proxy for fairness and epistemic breadth. Through extensive, reproducible experiments on two real-world AI-related propositions, the study systematically analyzes how parameters like creativity rate (the probability of authoring new justifications), recommendation size, argumentation density, and population scale affect coverage and corpus diversity. The empirical results show significant gains in coverage with larger recommendation budgets, higher creativity rates, and population sizes, but also reveal structural early-arrival penalties where early voters see less coverage. Under adversarial strategic voting attacks on the argument relation graph, uniform relation weighting collapses coverage dramatically, while a reversed-PageRank style author-weighted relation scheme resists these manipulations, preserving coverage substantially better. Overall, ABAS provides a reproducible, agentic simulation framework that operationalizes the NP-hard Subsuming Justification Problem and enables quantitative evaluation of argument recommendation mechanisms in adversarial deliberative settings.

Key findings

Mean combined coverage of endorsement-ranked top-K recommendations reaches approximately 0.808 ± 0.035 for K=20, N=1000, pown=0.10 on the BRA topic, matching similar values on UBI topic.
Coverage grows monotonically with creativity rate pown from 0.693 at pown=0.02 to 0.881 at pown=0.30, a 19 percentage point improvement.
Recommendation size K strongly affects coverage from 0.463 at K=5 to 0.836 at K=30, with diminishing returns beyond K=15.
Population size N increases coverage, particularly making coverage more stable and comprehensive for late voters.
Early-arrival penalty causes voters arriving before the corpus matures to receive lower coverage; at pown=0.02 full tag coverage only appears after ~800 voters, while at pown=0.30 it appears after ~150 voters.
Author-count weighted relation scoring (reversed-PageRank with weights proportional to fraction of side voters authoring relation) resists tag-flood coordinated attacks markedly better than uniform relation weights, preserving coverage under strategic manipulation.
Corpus diversity in terms of unique reason tags and own justifications grows with creativity rate pown and correlates with higher overall recommendation coverage.
Argumentation graph density (attack and enhance links) scales with corpus size and link probability plinks, influencing coverage robustness.

Threat model

The adversary is a coordinated coalition of strategic voters within an authenticated electorate who cannot create multiple identities (no Sybil attacks) and do not manipulate vote direction but seek to game the recommendation ranking by adversarially submitting argumentation graph relations (attack and enhancement links). Their goal is to flood recommendation slots with narrow or poisonous tags that reduce coverage and diversity across the electorate by leveraging endorsement mass propagation. The adversary cannot directly alter other voters' opinions or the base vote content.

Methodology — deep read

Threat Model and Assumptions: The study assumes an authenticated electorate with no Sybil attacks (no multiple identities), and voters cast an honest vote according to their latent opinion. The only possible strategic manipulation surface is the argument relation graph, where some voters (attackers) coordinate to submit relations aiming to game recommendation rankings. Attackers vote sincerely but adversarially craft enhance and attack links to influence visibility. 2. Data: The corpus represents two real-world binary propositions related to AI policy: Basic Resource Assurance (BRA) and Basic Income (UBI). A set of 30 endorsing and 30 opposing atomic reason tags were manually defined per topic. Justifications are natural-language passages instantiating one or more such tags. The voter agents are simulated, assigned latent opinions drawn uniformly from [-1,1], which determine vote direction. Opinion quartiles map deterministically to generation of position texts using reason-tagged sentences. 3. Architecture and Algorithm: ABAS simulates N agents sequentially voting. Each agent receives a personalized recommendation list of top K justifications per side, ranked by an observable endorsement score derived from reversed-PageRank over the attack and enhance graph with weighted relations. Agents may select an existing justification from their recommendations (based on TF-IDF similarity to their position text) or author a new one (with probability pown). Justifications have attack and enhance relations probabilistically added with probability plinks. The framework models the Subsuming Justification Problem (SJP), which seeks a K-size set of justifications covering all reason tags. Since SJP is NP-hard, a greedy approximation algorithm selects justifications maximizing coverage incrementally. Coverage is defined as the fraction of the contemporaneous corpus reason-tag vocabulary represented in the recommended justifications for a voter. 4. Training and Simulation Regime: Each simulation run processes all N voters sequentially. Parameters controlled include pown (creativity rate), pexisting, pnone (selection probabilities), recommendation size K, link probability plinks, and population size N. The corpus and argumentation graph persist and grow across multi-round runs. 10 random seeds per configuration are used. Scaling tests vary each parameter one-at-a-time. Adversarial runs inject up to 25% attackers who strategically submit relations according to hub-riding or tag-flood attack strategies. 5. Evaluation Protocol: Metrics include mean and median combined coverage per voter of endorsement-ranked recommendations, number of unique reason tags covered, own justifications count, argument graph edge counts, and standard deviations across seeds and within-run variability across voters. Coverage is compared to an oracle greedy coverage upper bound computed offline using full access to reason tags (not available to live recommender). The dynamics of coverage over voter order reveal early-arrival penalties. 6. Reproducibility: The simulator stores a database of votes, justifications, relations, and recommendation snapshots for auditability and after-the-fact reconstruction. Source code, parameter configurations, and seeds are documented; the argumentation web-browser supports manual inspection and interactive exploration. No public dataset of real-world deliberative texts is involved, but the reason-tag schemas and generated texts are fully described.

Technical innovations

Agentic Bipolar Argumentation Simulator (ABAS) combining LLM-seeded agent models with a formal bipolar abstract poll debate framework (BAPDF) facilitating systematic evaluation of argument coverage in polling scenarios.
A novel endorsement-based reversed-PageRank scoring of justifications propagating endorsement mass through attack and enhance relations weighted by author ratios, improving resistance to strategic manipulation.
Integration of symbolic atomic reason tags for justifications to enable transparent, enumerable, and formally analyzable coverage metrics tied to argumentative comprehensiveness versus generic diversity metrics.
Implementation of comprehensive multi-round sequential simulation that preserves corpus and relations across rounds, enabling study of early-arrival coverage penalties and longitudinal auditability in deliberative polling.
Adversarial stress testing of argumentation relation weighting schemes (uniform vs author-count weighted) against coordinated tag-flood and hub-riding attacks, demonstrating practical manipulations and defenses.

Datasets

BRA (Basic Resource Assurance) — simulated corpus with 30 supporting and 30 opposing reason tags, proprietary simulation-generated
UBI (Basic Income) — simulated corpus with 30 supporting and 30 opposing reason tags, proprietary simulation-generated

Baselines vs proposed

Greedy K-coverage Oracle: mean coverage > 0.95 for K=20 vs Endorsement-ranked live recommender: ~0.81
Uniform relation weight scoring under tag-flood attack: coverage collapses near zero vs weighted author-count relation scoring: coverage remains above 0.6 with 25% attackers
Coverage at K=5: 0.46 vs K=30: 0.84 (BRA topic)
Coverage at pown=0.02: 0.69 vs pown=0.30: 0.88
Coverage at N=200: ~0.65 vs N=5000: ~0.89

Limitations

The simulation assumes truthful voting behavior except for strategic relation submission; real-world strategic voting might involve vote flipping or identity fraud.
Reason tags and justification texts are synthetic and hand-authored; real argument mining from free text might introduce noise and ambiguity not captured here.
Only one-hop reversed PageRank scoring is used; deeper multi-hop propagation might yield different results but is computationally heavier.
Evaluation does not include human subjects or actual user interaction data, limiting ecological validity of modeled behavior and feedback effects.
Early-arrival penalty inherent in sequential recommendation remains a structural challenge not fully solved.
The adversarial model assumes attackers only manipulate the relation graph; other attack surfaces (e.g., fake justification texts) are not studied.

Open questions / follow-ons

How would argument mining from raw text and noisy or ambiguous justification extraction affect coverage and recommendation robustness?
Can multi-hop propagation or alternative graph-based scoring methods improve recommendation quality and manipulation resistance further?
What mechanisms could mitigate the early-arrival penalty to ensure more uniform coverage for early voters in sequential deliberations?
How would incorporating user feedback loops and vote changes impact corpus evolution and coverage dynamics in ABAS?

Why it matters for bot defense

This work provides a rigorous simulation framework and formalized evaluation metrics that bot-defense and CAPTCHA engineers can adapt to study argument recommendation and user interaction fairness in adversarial, large-scale online polling or deliberation systems. The approach of leveraging endorsement propagation in a weighted bipolar argument graph and modeling strategic relation manipulation parallels concerns about adversarial actor influence in reputation or trust scoring systems common in web security contexts. Coverage metrics based on transparent reason tags are instructive for designing interpretable recommendation filters that balance diversity and engagement without enabling attack vectors that degrade information quality. Additionally, the agentic simulation methodology can inform realistic synthetic user behavior models for evaluating platform resilience to coordinated strategic attacks, analogous to bot or spam campaigns. However, the domain focus on deliberative polling is specialized; direct application to CAPTCHA systems requires adaptation of concepts around coverage, endorsement, and manipulation resistance tailored to authentication flows rather than argument recommendation.

Cite

bibtex

@article{arxiv2606_11692,
  title={ Evaluation of Alternative-Based Information Systems for Deliberative Polling using an Agentic Simulator },
  author={ Rwaida Alssadi and Khulud Alawaji and Balaji Kasula and Muntaser Syed and Badria Alfurhood and Markus Zanker and Marius Silaghi },
  journal={arXiv preprint arXiv:2606.11692},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.11692}
}

Evaluation of Alternative-Based Information Systems for Deliberative Polling using an Agentic Simulator ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​