Evidence Markets

Source: arXiv:2606.07434 · Published 2026-06-05 · By Safwan Hossain, Gabriel Andrade, Chengqi Zang, Yiling Chen

TL;DR

This paper addresses two core limitations of traditional prediction markets: they only aggregate beliefs without revealing underlying evidence or reasoning, and they require an event with a known external ground truth and resolution time. The authors introduce evidence markets, a generalization that incentivizes participants to submit both beliefs and accompanying evidence, and allows for endogenous resolution using the accumulated evidence when external ground truth is unavailable. The core mechanism modifies the classical logarithmic market scoring rule (LMSR) by dynamically adjusting the liquidity parameter based on the accumulated evidence quality. This coupling of price formation to evidence quality enables bounded platform loss and incentivizes truthful reporting of both beliefs and evidence. The authors prove that truthful evidence and belief reporting is always an 𝜀-dominant strategy incentive compatible (DSIC) equilibrium for endogenous resolution and strict DSIC for exogenous resolution. They also propose a practical evidence verification framework using large language models (LLMs) as judges augmented by staking and dispute resolution. A running example throughout the paper is LLM evaluation, where the question of which model performs best on a task lacks objective external ground truth, making evidence markets well suited. Overall, the contribution is a theoretically grounded market design that expands prediction markets' scope by incorporating evidence and enabling endogenous resolution through probabilistic softmax sampling over accumulated evidence support.

Key findings

The proposed evidence-augmented LMSR bounds platform loss by beta(R0) * log(n), where beta(R0) is liquidity at initial evidence quality (Proposition 1).
A trader’s belief difference due to withholding evidence is bounded by ||q(ET) - q(E′T)||1 ≤ (|ET| / τK) (Theorem 1), allowing arbitrarily small manipulation by tuning softmax temperature τ.
Dynamic liquidity parameter beta(·) decreases with evidence quality Rt, ensuring positive payoff for submitting higher-quality evidence (Figure 2).
Truthful belief and full evidence submission is an 𝜀-dominant strategy incentive compatible (DSIC) equilibrium under endogenous resolution and strict DSIC under exogenous resolution (Proposition 1, Theorem 2).
Endogenous resolution samples the market outcome from a softmax distribution over cumulative evidence support fractions θi, instead of requiring external ground truth (Definition 4).
Evidence quality function r(·) is assumed monotone increasing and non-negative allowing flexible verification/filtering based on evidence context.
Proposed asynchronous execution algorithm decouples trade execution from costly evidence verification delays.
They formalize and bound how selective evidence withholding shifts a trader’s resolution belief, crucial for incentive compatibility.

Threat model

Adversaries are rational traders in the market who may attempt to strategically withhold subsets of their private evidence to bias the endogenous market resolution in their favor and maximize payoffs. They cannot falsify or fabricate evidence units, only selectively submit subsets. They know the market mechanism and anticipate how selective evidence submission affects beliefs and payoffs. They lack direct control over future external ground truth in exogenous resolution mode. The platform cannot observe private information but relies on incentives and verification to ensure truthfulness.

Methodology — deep read

Threat Model & Assumptions: The adversaries are market traders who may strategically report beliefs or selectively submit subsets of their private evidence to maximize payoffs. The model assumes traders cannot fabricate or tamper with atomic evidence units, only selectively withhold evidence subsets. Evidence quality is publicly observable and cumulative evidence impacts liquidity dynamically. The market is resolved either externally by a known future event or endogenously by the submitted evidence.
Data: There is no real-world dataset. The model is abstract but grounded by a running example of LLM evaluation where evidence units are question-answer pairs assessing model correctness. Evidence quality function r(·) is monotone non-negative but abstract; no strict dataset or preprocessing is needed beyond this.
Architecture/Algorithm:

Market uses a logarithmic market scoring rule (LMSR) with a liquidity parameter beta(Rt) that decreases as cumulative evidence quality Rt increases.
Traders arrive sequentially, holding private belief distributions qt over n alternatives, and private evidence sets Et.
They submit a subset of evidence E't⊆Et and belief ˆqt (may differ from true belief).
Payoff depends on evidence-augmented LMSR: payoff_t(ˆqt,E't) = beta(Rt) log ˆqω_t - beta(Rt-1) log qω_{t-1} where ω is resolved outcome.
For endogenous resolution, the market resolves when K evidence pieces accumulate; outcome is sampled via softmax over fraction of evidence supporting each alternative.
The model formalizes a trader’s belief about resolution as a softmax over adjusted evidence scores that depend on submitted and withheld evidence.
They prove bounded influence of selective evidence withholding on resolution beliefs.

Training Regime: N/A (theoretical work)
Evaluation Protocol:

Theoretical proofs establish incentive compatibility, payoff boundedness, and belief sensitivity bounds.
Key theorems characterize how liquidity parameter choice β(·) and softmax temperature τ trade off resolution sharpness with manipulation resistance.
Example illustrates how withholding evidence shifts beliefs by at most |Et|/(τK).
No empirical experiments but mathematical rigor and formal lemmas are central.

Reproducibility:

The mechanism is fully formalized, but code or dataset release is not applicable.
The paper details algorithms for evidence verification via LLM-as-Judge and market execution.

Example end-to-end: A trader t arrives with belief vector qt and evidence Et on n alternatives. They select a subset E't to submit along with a belief ˆqt. The market scoring rule evaluates their payoff adjusting liquidity beta based on cumulative evidence quality R. If resolution is endogenous, after K pieces of evidence are collected, the market samples outcome ω from a softmax over the normalized evidence-support counts for all alternatives. The trader’s payoff thus depends on their truthful belief and whether they submitted their full evidence set. By tuning temperature τ high enough, the market limits how much selectively withholding parts of Et can bias the resolution belief qt (≤ε). This incentivizes complete, truthful reporting of both evidence and beliefs, bounding the market maker’s loss while enabling endogenous resolution with richer interpretability than classical prediction markets.

Technical innovations

Integrating evidence submission directly into prediction markets alongside beliefs, generalizing LMSR mechanisms with evidence-augmented payoffs.
Dynamically adapting LMSR liquidity parameter as a function of cumulative evidence quality to modulate incentives and platform risk.
Formulation of endogenous resolution via softmax sampling over accumulated evidence support fractions rather than requiring external ground truth for resolution.
Mathematical characterization and bounding of how selective evidence withholding shifts traders’ resolution beliefs, enabling ε-DSIC incentives for truthful evidence submission.
Operational proposal for LLM-as-Judge evidence verification combined with staking and dispute resolution addressing verification bottlenecks.

Datasets

LLM evaluation evidence — abstract question-answer pairs as atomic evidence units from crowdsource-like settings in example use case

Baselines vs proposed

Classical LMSR: platform loss bounded by β log n vs evidence-augmented LMSR: platform loss bounded by β(R0) log n with liquidity dynamic
Standard prediction markets: only belief submission with exogenous resolution vs evidence markets: submission of both evidence and beliefs with endogenous or exogenous resolution and incentive compatibility guarantees
Fixed liquidity LMSR vs dynamic liquidity LMSR: dynamic liquidity pays positive evidence submission payoff incentivizing full evidence submission

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2606.07434.

Fig 1

Fig 1: High-Level flow of the proposed evidence market. Section 4 studies the connection between belief and

Fig 2

Fig 2: Log Scoring curves under dif-

Limitations

The evidence quality function r(·) is abstract and context-dependent; practical instantiations and verification remain open challenges.
No empirical validation or implementation beyond theoretical modeling; real-world user behavior, collusion, or adversarial manipulation not yet tested.
The endogenous resolution softmax sampling introduces resolution noise and reduced sharpness compared to exogenous resolution.
The model assumes traders cannot fabricate or tamper with evidence, focusing on selective withholding only, possibly optimistic for some adversarial settings.
LLM-as-Judge evidence verification relies on model accuracy and dispute mechanisms, which may not fully prevent sophisticated manipulation or gaming.
The mechanism's performance and incentive guarantees under distribution shifts or with non-homogeneous trader rationality are not explored.

Open questions / follow-ons

How to design practical, robust evidence quality metrics and verification mechanisms for diverse domains beyond LLM evaluation?
What is the impact of collusion or coordinated withholding/submission of evidence by multiple traders on market integrity and incentive compatibility?
Can the endogenous resolution mechanism be extended to support richer outcome spaces or continuous event spaces beyond discrete alternatives?
How does the mechanism behave when traders have heterogeneous risk preferences, limited rationality, or incomplete knowledge of market rules?

Why it matters for bot defense

Bot-defense practitioners can draw from the evidence market framework to design verification mechanisms that do not solely rely on static, externally verifiable ground truth but instead aggregate crowd-sourced evidence to dynamically assess model or system behavior. This approach can be valuable in CAPTCHA or bot-detection scenarios where ground truth labels may be unavailable or subjective, such as evaluating novel attack vectors or evolving bot strategies. The evidence-augmented LMSR and dynamic liquidity adjustment concepts provide formal incentives to elicit truthful evidence alongside beliefs, helping to detect and penalize malicious participants withholding key information. The proposed LLM-based adjudication with staking could inspire more scalable and decentralized CAPTCHA validation systems integrating human and model judgments. However, the theoretical nature and abstraction mean adaptation for operational bot defense requires work in practical evidence verification, adversarial robustness, and efficient dispute resolution.

Cite

bibtex

@article{arxiv2606_07434,
  title={ Evidence Markets },
  author={ Safwan Hossain and Gabriel Andrade and Chengqi Zang and Yiling Chen },
  journal={arXiv preprint arXiv:2606.07434},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.07434}
}

Evidence Markets ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​