Mazocarta: A Seeded Procedural Deckbuilder for Instrumented Game Development

Source: arXiv:2605.08319 · Published 2026-05-08 · By Timothy C. Cogan

TL;DR

Mazocarta is not primarily a new deckbuilder mechanic; it is an engineering artifact meant to prove that a procedural game can be built so the exact same rules core drives play, simulation, tests, save/load fixtures, and local multiplayer. The paper’s main problem is the familiar one in procedural deckbuilders: balance and regression checks are hard because outcomes emerge from many interacting systems over many seeded runs. Mazocarta responds by making determinism and instrumentation first-class, rather than bolting on a separate analytics model after the fact.

The new part is the shared production Rust rules core, compiled to WebAssembly for browser play and also executable natively for fixed-seed actor simulation. Around that core, the author adds an end-to-end verification surface (unit, save/load, Playwright, fake-camera QR tests, soak runs) and a QR-mediated WebRTC pairing flow for local multiplayer with no hosted game server. The evaluation is intentionally modest: over 1,000 deterministic seeds, the single-player autoplay actor won 36.1% of runs and the two-player actor won 34.9%, which the paper frames as repeatable development probes rather than player-balance claims.

Key findings

The Rust rules core is shared across browser play, native simulation, save/load fixtures, browser E2E tests, and multiplayer, so failures can be attributed to the same production state transitions rather than to a separate simulation model.
On a clean checkout of commit 65818ed3ba84942efbbe950099def180f7e3c9ac, the rule test suite reported 466 passed / 0 failed, and the save/load test subset reported 25 passed / 0 failed.
The browser verification surface passed 23 Playwright E2E and QR tests with 0 failures, including host/guest pairing and fake-camera QR decode paths.
The multiplayer soak smoke command (npm run soak:2p -- --runs 5 --seed-start 1) reported 0 stalls; all 5 sessions hit progress timeouts while still advancing, which the paper treats as a smoke-level synchronization signal rather than completion evidence.
Over 1,000 deterministic seeds, the single-player actor completed 361 wins and 639 losses with 0 aborts, i.e. a 36.1% win rate.
Over the same 1,000-seed range, the two-player actor completed 349 wins and 651 losses with 0 aborts, i.e. a 34.9% win rate.
In single-player autoplay, completed runs cleared on average 14.85 combats, 1.65 elites, and 2.15 bosses, with average victory HP of 37.85.
In two-player autoplay, completed runs cleared on average 14.03 combats, 1.49 elites, and 2.03 bosses, with average party HP of 78.98 and 2.00 surviving heroes at victory.

Threat model

The paper’s effective threat model is a development pipeline under regression risk: the system must remain deterministic and comparable across code changes, seeds, and execution surfaces. The adversary is not a remote attacker but implementation drift, content changes, and simulation/interactive mismatches that would make balance probes and tests unreliable. In multiplayer, peers are assumed to be honest local browsers exchanging signaling and state over WebRTC; the paper does not analyze cheating, tampering, or hostile network peers in depth.

Methodology — deep read

Mazocarta’s threat model is primarily an engineering one, not an attacker-evasion model: the paper is about making procedural game rules deterministic and inspectable under changes, not about defending a networked service from a malicious remote adversary. The implicit “adversary” is entropy, regression, and mismatch between test-time simulation and player-facing execution. The paper assumes the game author controls the production codebase and wants seeds, saved state, combat resolution, map generation, rewards, and multiplayer synchronization to be reproducible from serialized inputs. For multiplayer, the system assumes local-area browser peers can establish a WebRTC connection after exchanging signaling data, but it explicitly does not remove WebRTC’s normal connectivity constraints (NAT, browser support, ICE behavior).

The data used for evaluation is not a labeled human-play dataset; it is the game itself under deterministic seeds. The paper reports results from a clean checkout of repository commit 65818ed3ba84942efbbe950099def180f7e3c9ac using Rust 1.93.1, Node.js 24.6.0, npm 11.12.0, and Playwright 1.59.1. For the reproducibility snapshot, the author ran 1,000 deterministic seeds for single-player actor simulation (cargo run --bin actor -- --runs 1000 --seed-start 1) and 1,000 deterministic seeds for two-player simulation (cargo run --bin actor -- --players 2 --runs 1000 --seed-start 1). The paper also reports test counts from the development harness: cargo test -q for core rules, cargo test -q save for save/load fixtures, make test-e2e for browser and QR tests, and a soak smoke command for multiplayer (npm run soak:2p -- --runs 5 --seed-start 1). No train/validation split exists because this is not a learned model or supervised dataset paper.

Architecturally, Mazocarta is a tactical card game with seeded procedural progression over an N-sector run structure, branching map nodes, visible enemy intents, shops, events, rest nodes, elites, and bosses. The notable game-mechanics design choice is the use of three signed axes: Focus (scales outgoing attack damage), Rhythm (scales shield gain), and Momentum (scales energy and enemy intent output). These axes are content-defined interaction variables rather than a hidden global rock-paper-scissors table. Cards, enemies, rewards, modules, and requirements can read from and write to these axes, scale effects from them, or replace ordinary effects. The presentation is tile-based and responsive, with stable geometric units that reflow for hand size and viewport changes, which makes state easier to test and to keep readable on mobile browsers. For multiplayer, the system uses standard WebRTC data channels for transport, but QR codes or manual paste provide the signaling path. The signaling bridge serializes offer/answer payloads, can compress and frame them for QR display, and lets a host and guest exchange connection data without a central server. After connection, the host broadcasts scene and party state while each participant maintains per-hero state.

The simulation algorithm is a heuristic autoplay actor, not a learned policy. The actor starts each run from a fixed seed, walks the procedural map, resolves rewards, shops, and events, and then plays combat by scoring legal actions. The heuristic prefers victory, defensive survival, useful zero-cost plays, setup actions, and positive trades. This policy is intentionally simple so that a failed or stalled run usually indicates a concrete rule, content, or balance issue in the production game rather than a mismatch between a surrogate model and the real game. For example, in a single deterministic seed run, the actor would select a map node, enter combat, evaluate legal cards using the scoring heuristic, advance the exact same state transitions that the browser client uses, and continue until it either wins the run, loses, or hits a simulator guardrail. The paper emphasizes that this is a probe, not an optimal player model; it is meant to be stable and cheap, not strategically strong.

Evaluation is mostly verification plus aggregate autoplay statistics. The author reports 466 passing rule tests, 25 passing save/load tests, 23 passing Playwright E2E/QR tests, and a 0-stall result in a 5-run multiplayer soak smoke test, but notes that the soak still produced progress timeouts while the sessions advanced. For the actor-based balance snapshot, the key metrics are completed-run win rate and mean progression counts over completed runs only. Single-player autoplay achieved 361 wins / 639 losses / 0 aborts, with averages of 14.85 combats, 1.65 elites, 2.15 bosses, and 37.85 victory HP. Two-player autoplay achieved 349 wins / 651 losses / 0 aborts, with averages of 14.03 combats, 1.49 elites, 2.03 bosses, 78.98 party HP at victory, and 2.00 surviving heroes. The paper does not report confidence intervals, statistical significance tests, or ablations comparing alternative heuristics; it explicitly says the current snapshot is for reproducible signals, not final balance claims.

Reproducibility is relatively strong at the artifact level: the paper names the exact repository commit, the toolchain versions, and the commands used to generate the checks and actor results. It also states the repository is MIT licensed and public. What is missing is deeper experimental reproducibility in the statistical sense: there is no multi-policy comparison, no random-seed sensitivity analysis beyond the fixed seed range reported, and no released frozen dataset because the artifact is the game itself. The paper’s concrete claim is that by making simulation and play share one rules core, future changes can be tested against the same executable logic that players use, which is the central methodological point.

Technical innovations

A single Rust rules core is reused across browser gameplay, native fixed-seed simulation, save/load fixtures, E2E tests, and multiplayer synchronization, instead of maintaining a separate analytics simulator.
QR- or paste-mediated WebRTC signaling is integrated into a local-first browser game so two peers can establish direct play without a hosted matchmaking or game-state server.
The autoplay pipeline is intentionally deterministic and heuristic, making it suitable as a regression probe for balance shifts, unreachable states, and logic freezes on the exact production rules.
The multiplayer state model separates shared dungeon progression from per-hero run state, which lets one map route coexist with individual combat and reward consequences.

Datasets

1,000 deterministic single-player seeds — 1,000 runs — generated from the Mazocarta rules core / seed-start 1
1,000 deterministic two-player seeds — 1,000 runs — generated from the Mazocarta rules core / seed-start 1

Baselines vs proposed

Rust rule tests: 466 passed, 0 failed vs proposed: 466 passed, 0 failed
Save/load tests: 25 passed, 0 failed vs proposed: 25 passed, 0 failed
Browser E2E and QR tests: 23 passed, 0 failed vs proposed: 23 passed, 0 failed
Multiplayer soak smoke: 0 stalls, 5 progress timeouts while still advancing vs proposed: 0 stalls, 5 progress timeouts while still advancing
Single-player actor: win rate = 36.1% (361/1000) vs proposed: 36.1% (361/1000)
Two-player actor: win rate = 34.9% (349/1000) vs proposed: 34.9% (349/1000)

Limitations

The evaluation is a snapshot over deterministic seeds, not a statistical study; there are no confidence intervals, hypothesis tests, or multiple independent runs reported.
The autoplay policy is hand-written and deliberately simple, so it may badly underrepresent human play, especially in two-player coordination.
No ablation study isolates which part of the architecture (determinism, QR pairing, shared rules core) contributes most to the observed reproducibility.
The multiplayer smoke test is weak evidence: 5 runs with progress timeouts do not establish reliable completion behavior or real-world usability.
There is no user study, comparative genre benchmark, or long-horizon balance analysis across content revisions.
WebRTC and QR pairing still depend on practical browser/device constraints, camera access, ICE behavior, and local network conditions.

Open questions / follow-ons

How much of the reproducibility signal comes from deterministic rules alone versus the specific heuristic actor policy, and how would alternative policies change the balance probes?
Can the shared-rules-core approach scale to stronger statistical evaluation, such as confidence intervals over seed ranges, failure classification, and policy ensembles?
How robust is QR-mediated WebRTC pairing across real devices, camera quality, payload sizes, and different browser/network environments?
Would human playtests correlate with the autoplay-derived signals, or do the current win rates mostly reflect heuristic weakness rather than game balance?

Why it matters for bot defense

For a bot-defense or CAPTCHA practitioner, the key lesson is architectural rather than model-specific: if you need reliable automated evaluation of a live system, make the test harness execute the same production logic as the user-facing path. Mazocarta shows how deterministic seeds, shared rules, and end-to-end fixtures can turn a complex interactive system into something you can regression-test, smoke-test, and compare over time without maintaining a second approximation layer.

The QR/WebRTC part is also relevant as an example of local-first challenge-response plumbing, but not as a direct CAPTCHA design. The important part is the separation of signaling from state and the use of a simple, inspectable transport for pairing. A bot-defense engineer can take from this the value of reducing divergence between human and automated paths, instrumenting the real production state machine, and treating reproducibility as a first-class security/debugging primitive. The paper does not provide a CAPTCHA scheme or anti-bot metric; it mainly demonstrates how to build a system that is easier to probe honestly and harder to accidentally desynchronize.

Cite

bibtex

@article{arxiv2605_08319,
  title={ Mazocarta: A Seeded Procedural Deckbuilder for Instrumented Game Development },
  author={ Timothy C. Cogan },
  journal={arXiv preprint arXiv:2605.08319},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.08319}
}

Mazocarta: A Seeded Procedural Deckbuilder for Instrumented Game Development ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​