PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers
Source: arXiv:2605.30094 · Published 2026-05-28 · By Boning Li, Baoxiang Wang, Longbo Huang
TL;DR
This paper tackles the longstanding challenge of achieving expert-level play in Heads-Up No-Limit Texas Hold’em (HUNL) poker without relying on computationally expensive equilibrium solvers or training. Prior state-of-the-art methods require millions of core-hours for counterfactual regret minimization (CFR) solvers, while Large Language Models (LLMs) contain extensive poker knowledge but fail to leverage it effectively for actual play. The authors introduce PokerSkill, a novel framework that integrates detailed, human-designed rule-based poker skill prompts as a structured action-grounding interface to guide LLM decision-making. Using a deterministic context engine that extracts high-level game state features and maps these to layered skill libraries, PokerSkill constrains the LLM's action choices to strategically viable moves without any training or solver queries. Empirically, PokerSkill substantially reduces losses against the GTOWizard benchmark—improving GPT-5.5 from −132 to −57 mbb/hand and outperforming the strong solver-based open-source bot Slumbot. This demonstrates for the first time that a zero-shot LLM agent guided by expert domain knowledge can achieve competitive performance in complex imperfect-information poker.
Key findings
- GPT-5.5 XHigh with PokerSkill achieves −57 ± 21 mbb/hand against GTOWizard, a 57% loss reduction from −132 ± 25 mbb/hand with default prompting.
- Claude Opus 4.6 with PokerSkill reduces losses from −204 ± 44 to −80 ± 29 mbb/hand, a 61% relative improvement.
- Claude Opus 4.7 with PokerSkill reduces losses from −170 ± 28 to −87 ± 64 mbb/hand, a 49% improvement.
- All PokerSkill agents outperform the strong solver-based open-source bot Slumbot which loses −194 ± 41 mbb/hand on GTOWizard.
- Rule-based PokerSkill deterministic agent (no LLM) achieves −132 ± 19 mbb/hand, substantially worse than LLM-guided agents, showing synergy is key.
- PokerSkill skill library encodes ~60 action-line scenarios, 23 hand classes, and 46 bet-size pressure thresholds designed by human poker experts.
- PokerSkill uses a deterministic Context Engine to convert game state to compact labels (board texture, hand strength, action line, position, SPR) for selective skill retrieval.
- Viable action sets filtered by an attack/defense (ATT/DEF) budget system ensure multi-street strategic coherence and prevent invalid or unsound moves.
Threat model
The adversary is the inherent complexity and partial observability of imperfect-information poker in the single-agent decision environment. There is no adversarial manipulation or leakage of internal model parameters. The threat is achieving near-equilibrium play without solver queries or training despite complex hidden information and multi-street dependencies.
Methodology — deep read
The paper frames the threat or challenge as enabling LLMs to perform competitive poker decisions without solver calls or offline learning despite high game complexity and imperfect information.
Data provenance involves no supervised training data. Instead, evaluation uses at least 5,000 hands per experiment from GTOWizard, a strong solver-based benchmark opponent providing AIVAT variance-reduced results and detailed performance metrics in mbb/hand.
The main architecture consists of a deterministic Context Engine that extracts abstract features from the current hand and board state: hand class, board texture, betting action line history, position, stack-to-pot ratio (SPR), and cumulative betting pressure. These labels drive selective prompt retrieval from a layered skill library entirely curated by expert human poker players based on equilibrium theory but encoded as rule-based textual fragments. The skill library has five layers—from stable principles (e.g., value/bluff separation) to board- and hand-specific strategic advice, including blocker-based river guidance.
An attack/defense budget system quantifies the permissible aggression or defense capacity remaining for a hand, decreasing with prior betting pressure across streets. This budget enforces multi-street strategic coherence and constrains the LLM's viable action set at each node. The constrained action list (fold, call, raise sizes) is then fed into the LLM along with context-selected skill fragments and game state, prompting it to produce a structured JSON response specifying the chosen action.
Training per se does not occur; the approach is zero-shot. The LLMs used are GPT-5.5 XHigh and Claude Opus 4.6/4.7, queried via their APIs with temperature 1.0 to accommodate reasoning demands. Response validation ensures legal actions and sizing; fallback to the most conservative legal action occurs under 0.1% of the time.
Evaluation compares PokerSkill against the GTOWizard benchmark over thousands of hands with AIVAT variance reduction, reporting mean mbb/hand loss and standard errors. The study includes default prompt baselines (LLMs without PokerSkill), a rule-based only baseline, and Slumbot as a solver-based reference.
Reproducibility is enabled through public code release. Though the PokerSkill skill library is human-authored, any researcher with access to the same LLM APIs can reproduce the results. The design is fully deterministic except the stochastic LLM calls, providing auditable and extensible codebases.
Technical innovations
- A deterministic Context Engine that translates raw poker game states into compact labels (board texture, hand class, action line, SPR, position) for selective retrieval of skill fragments.
- A layered, human-expert-designed skill library encoding detailed poker theory as structured prompt fragments enabling context-specific LLM action grounding without training or solver calls.
- An attack/defense budget system that encodes multi-street strategic constraints into scalar budgets governing action viability, maintaining global strategic coherence.
- Structured, constrained action grounding where LLMs select only among viable, rule-filtered actions, reducing hallucinations and invalid moves in zero-shot poker play.
Datasets
- GTOWizard benchmark — ≥5,000 hands per evaluation — public benchmark for poker AI with variance reduction.
Baselines vs proposed
- GPT-5.5 XHigh Default Prompt: −132 ± 25 mbb/hand vs PokerSkill: −57 ± 21 mbb/hand
- Claude Opus 4.6 Default Prompt: −204 ± 44 mbb/hand vs PokerSkill: −80 ± 29 mbb/hand
- Claude Opus 4.7 Default Prompt: −170 ± 28 mbb/hand vs PokerSkill: −87 ± 64 mbb/hand
- Rule-based (no LLM) PokerSkill Only: −132 ± 19 mbb/hand vs LLM PokerSkill agents ~ −57 to −87 mbb/hand
- Slumbot solver-based bot (2018 ACPC champion): −194 ± 41 mbb/hand vs PokerSkill GPT-5.5: −57 ± 21 mbb/hand
Limitations
- No adversarial or out-of-distribution opponent evaluation beyond GTOWizard; performance against unknown human or adaptive bots unclear.
- LLMs underlying PokerSkill remain closed-source APIs, limiting direct model introspection or training reproducibility.
- Evaluation limited to GTOWizard environment and specific LLM versions current as of early 2026; results may shift with future model updates.
- The skill library and budgets are fixed and manually designed; scalability and adaptability to variant poker formats or dynamic opponent styles untested.
- PokerSkill reduces but does not eliminate performance gap to an ideal Nash equilibrium—losses remain negative and variance moderately high.
- Fallback and error handling relies on fallback to conservative actions when validation fails, which may limit exploratory play or innovation.
Open questions / follow-ons
- Can the PokerSkill approach generalize to other imperfect-information games with similarly complex strategic depth?
- How robust is the system to opponents adapting or deviating from equilibrium lines, i.e., can LLM-guided agents exploit suboptimal players while maintaining defense?
- Might tighter integration or dynamic updating of the skill library improve adaptation without retraining or solver access?
- Could interpretability or explainability of LLM decisions be improved further given the structured intermediate context representations?
Why it matters for bot defense
For bot-defense and CAPTCHA practitioners, PokerSkill demonstrates a compelling paradigm where external rule-based, expert system knowledge can be combined with large foundation models to produce contextually sound decisions in highly complex, multi-step, and imperfect-information environments without expensive training or solver dependence. This structured prompt augmentation approach could inspire methods to constrain and guide LLM outputs for security-critical decision tasks, ensuring compliance and reducing hallucination or irrelevant reasoning. The paper’s emphasis on deterministic context extraction and action filtering highlights the importance of tightly controlling LLM action spaces in adversarial or safety-sensitive applications.
Furthermore, PokerSkill’s demonstration that zero-shot LLMs can be elevated to near-expert levels in a game as challenging as HUNL underlines the potential risks and opportunities for LLM-enabled automated agents in online platforms. The principles of layered domain knowledge retrieval and context-specific bounding of choices may transfer to bot-detection or challenge generation, where human expertise encoded as prompt scaffolding can amplify a model’s reliability and interpretability while avoiding costly fine-tuning or solver access.
Cite
@article{arxiv2605_30094,
title={ PokerSkill: LLMs Can Play Expert-Level Poker without Training or Solvers },
author={ Boning Li and Baoxiang Wang and Longbo Huang },
journal={arXiv preprint arXiv:2605.30094},
year={ 2026 },
url={https://arxiv.org/abs/2605.30094}
}