Jas: AI-Paired Engineering as a Revival of N-Version Programming

Source: arXiv:2606.07828 · Published 2026-06-05 · By Jason Hickey

TL;DR

This paper presents a detailed case study demonstrating how AI-assisted paired engineering combined with two key safeguards— a precise executable YAML specification as a single source of truth and multiple parallel implementations—revives the concept of N-version programming for modern software development. The author, a single developer, completed five distinct implementations of a complex vector illustration application across varied platforms and languages (Rust, Swift, OCaml, Python, browser) over approximately 120 evening hours spanning seven weeks. The central innovation is leveraging AI to automate the vast bulk of per-port mechanical coding, while the executable specification consolidates core behavioral logic and UI definitions, and multiple implementations act as mutual differential testers to catch both bugs and underspecifications.

The project shows that what traditionally required multiple developer-years of effort can be compressed via this AI-assisted methodology to a single developer timeline without severe correctness compromises. The paper carefully quantifies cost savings via sub-linear amortization of the specification across implementations and emphasizes the role of manual testing and divergence exploration in surfacing ambiguous or incomplete specs. It honestly reflects on limitations like remaining manual effort and AI hallucinations. Overall, the work reframes AI-augmented multi-version programming as a practical approach for cross-platform, correctness-sensitive engineering workloads, supported by an open-source artifact and a reusable testing protocol.

Key findings

Five working implementations of a complex vector illustration app were developed by one developer in roughly 120 evening hours (approx. 3-4 hours per evening over 7 weeks).
The shared executable YAML specification totals ~23,000 lines, while per-port native escape-hatch code ranges from 0 lines (OCaml) to over 95,000 lines (Rust).
The shared YAML specification and interpreter collectively amass ~35,000 lines, with the per-port native code totaling ~300,000 lines across all ports.
Manual testing dominates remaining developer effort, exceeding spec-writing and per-port coding time, with transcripts listing 4,600 automated tests and 36 manual-test files.
Differential testing across 5 implementations revealed specification underspecifications within hours—e.g., bugs in hue preservation when saturation reaches zero, CMYK channel handling, and commit semantics for recent colors.
The cost of correctness gain from N implementations is sub-linear, with the majority of specification completeness bugs caught between 2 and 3 implementations.
The AI-paired engineering approach reduces conventional multi-developer-year vector app development to single-developer multi-week effort, shifting the economics underlying N-version programming viability.
The Color Panel feature’s shared YAML is 890 lines, supporting five visually equivalent implementations with per-port native code ranging from 0 to 1,309 lines.

Methodology — deep read

Threat model & assumptions: The paper does not focus on adversarial threats but rather on reliable correctness across diverse platforms and language implementations. The assumptions include a single developer assisted by AI, with the AI subject to hallucination and drift, mitigated by human-in-the-loop manual and automated testing. The adversary is essentially complexity and specification ambiguity rather than a hostile actor.
Data: The data consists of the executable YAML specification (~23,000 lines) describing panels, dialogs, tools, menus, and state models for the vector illustration app. There are five separate native implementations in Rust (Dioxus), Swift (SwiftUI), OCaml (GTK), Python (PySide6), and Python+Flask (HTML/JS). Across the project, approximately 4,600 automated test functions and 36 manual-test transcript files were authored. The specification is the master source; ports interpret it and also contain native code escape hatches for platform-specific needs.
Architecture/algorithm: The approach centers on a shared, executable YAML specification that is interpreted by a generic UI interpreter implemented once (in Python, reused/ported to all other languages). This interpreter reads the YAML declarative UI and behavior definitions and constructs native UI widgets via renderers specific to each platform. When YAML’s expressive power is insufficient (e.g., for immediate-mode graphics or platform-native reactive state models), escape hatch native code supplements the interpreter. The differential testing aspect treats the five independently rendered interfaces as cross-checks against divergence and specification underspecification.
Training regime: Not an ML training paper per se, but the AI assistant (Claude Code) is used for code generation, review, and prompting through iterative paired-engineering loops. The development proceeded over roughly 48 calendar days with 1,807 commits, averaging 120–160 evening hours. Iteration includes prompt-driven revision of design docs, YAML specifications, multi-port implementations, and manual/automated testing.
Evaluation protocol: Metrics are primarily qualitative correctness and developer effort comparisons. Correctness is evaluated through automated test suites for state divergences, and extensive manual GUI behavioral and visual testing through scripted transcripts. Cross-port visual equivalence and behavior parity serve as proxies for specification completeness and implementation correctness, supplemented by observations of regression frequency and bug discovery rate. No formal statistical testing or adversarial robustness evaluation is reported.
Reproducibility: The full source code, specification, and manual test protocols are openly published at https://github.com/jyh/jas. Some complexity remains in replicating AI-assisted development sessions exactly, given AI prompt histories, memory files, and interactive dialog with AI assistants (Claude Code). The YAML specs and interpreter engines are public, enabling reproduction of the basic methodology and ported implementations.

An end-to-end example is the Color Panel: defined by 890 lines of declarative YAML controlling layout, slider modes, color swatches, bindings, and dialog state across five ports. The OCaml port uses zero lines of native code here, Swift adds 59 lines for state bridging, Python 123 for custom widgets, Rust 1,309 for immediate-mode rendering. Manual testing across ports quickly revealed underspecifications (e.g., hue preservation at saturation zero), which were refined in the YAML and propagated. This cycle of spec-authoring, AI-assisted implementation in ports, and multi-port visual/manual cross-checking embodies the methodology.

Technical innovations

Applying a precise, executable YAML specification as a single source of truth across multiple heterogeneous UI platforms and languages, enabling reliable multi-port implementations without re-implementation of core logic.
Pairing AI-assisted coding with parallel multi-version implementations as a built-in differential-testing correctness layer to surface subtle bugs and underspecifications early.
Reviving the concept of N-version programming practically by reducing the cost of multiple implementations via AI automation and a shared specification, shifting cost-benefit tradeoffs.
The split architecture of a shared generic interpreter combined with per-port native ‘escape hatch’ code permits declarative UI behavior specification while accommodating platform-specific rendering idiosyncrasies.

Datasets

Executable YAML specification — 22,866 lines — open source at https://github.com/jyh/jas
Automated tests — ~4,600 test functions — internal project repository
Manual test transcripts — 36 files — internal project repository

Baselines vs proposed

Conventional vector illustration apps (Adobe Illustrator, Inkscape, Affinity Designer): multi-developer multi-year efforts over decades vs. proposed: single developer, 5 platforms, ~120 evening hours
N=1 implementation: hidden specification underspecifications and silent bugs vs N=5 implementations: multiple specification underspecifications and behavioral bugs surfaced within hours
Color Panel per-port native code: OCaml: 0 vs Rust: 1,309 lines to support same 890-line YAML specification

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2606.07828.

Fig 3

Fig 3: The Color Panel rendered in all five ports, each showing the active color #3580c4 on the Dark

Fig 2

Fig 2: Color Panel spec amortization. Shared YAML (890 lines) drives five working implementations;

Fig 4

Fig 4: Methodology workflow. The outer loop turns prose design into YAML specification; the inner loop

Fig 4

Fig 4 (page 7).

Fig 5

Fig 5 (page 7).

Limitations

The project is a single-developer case study, limiting generalizability to team settings or other application domains.
Manual testing remains the dominant cost, highlighting incomplete automation of visual and behavioral test coverage.
Some AI hallucinations and session drift occurred, requiring human oversight, particularly for symbol and file path hallucinations.
The shared specification cannot express all platform-specific behaviors, requiring escape hatch native code that needs careful management.
No formal quantification of cross-port implementation independence or fault tolerance benefits typical of traditional N-version programming.
The approach relies heavily on the specific AI assistant (Claude Code) and its persistent memory features, which may not generalize to other AI tools.

Open questions / follow-ons

Can the methodology scale to larger teams or more complex applications beyond the vector illustration domain?
How can manual testing be further reduced or optimized, possibly via AI-augmented visual regression tools?
What are the limits of specification expressiveness, and can escape hatch code be minimized or better integrated?
Can the approach be extended to formally quantify and mitigate correlated failure modes arising from shared AI priors?

Why it matters for bot defense

Although the paper does not directly address security or bot-defense contexts, the methodology offers meaningful insights for CAPTCHA and bot-defense engineers focused on reliable implementation across platforms. The use of a precise executable specification combined with parallel differential-testing across multiple implementations suggests a robust engineering pattern to reduce implementation bugs and underspecifications in complex interactive systems, such as CAPTCHAs rendered on diverse devices. Moreover, AI-paired development could massively reduce the engineering effort needed to maintain cross-platform bot-detection challenges while ensuring behavioral consistency and correctness. The manual testing insights reinforce the importance of human-in-the-loop validation in detecting subtle divergences that automated tests miss, a vulnerability vector for automated bots. However, practical application would require adaptation to security threat models and adversarial robustness, which the paper does not cover.

Cite

bibtex

@article{arxiv2606_07828,
  title={ Jas: AI-Paired Engineering as a Revival of N-Version Programming },
  author={ Jason Hickey },
  journal={arXiv preprint arXiv:2606.07828},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.07828}
}

Jas: AI-Paired Engineering as a Revival of N-Version Programming ​

TL;DR ​

Key findings ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​