TClone: Low-Latency Forking of Live GUI Environments for Computer-Use Agents

Source: arXiv:2605.17320 · Published 2026-05-17 · By Yutong Huang, Vikranth Srivatsa, Alex Asch, Hansin Tushar Patwa, Yiying Zhang

TL;DR

This paper addresses the challenge of enabling live, forkable personal workspaces for computer-use agents (CUAs) that operate over interactive GUI environments on end-user PCs. CUAs execute complex tasks by manipulating open windows, files, browser sessions, credentials, and system state, which raises safety and quality concerns. Existing isolation and snapshotting solutions like VMs, containers, or checkpoint/restore tools either impose too high latency or fail to capture full GUI and application state for branching speculative execution. TClone introduces a versioned personal workspace system that provides fast branching, rollback, and selective commit of isolated workspace clones. It separates quick fork-time workspace duplication—using sibling containers, copy-on-write memory sharing, and filesystem versioning—from slower asynchronous durable checkpointing. By running the entire GUI stack inside each container, TClone captures GUI-local state and supports fine-grained workspace versioning needed for speculative search.

The implementation modifies the Linux kernel and CRIU to enable partial, lazy memory and filesystem sharing across container siblings and reconstructs consistent process trees with independent namespaces per branch. Evaluation on two CUA benchmarks shows TClone reduces workspace cloning latency by up to 4.9× versus KVM and 3.4× versus CRIU and cuts end-to-end agent task latency by 1.9× and 1.5× respectively. This demonstrates that TClone’s workspace versioning primitive enables safe, low-latency speculative execution in realistic GUI environments, supporting higher-quality agent workflows that need rollback and parallel exploration.

Key findings

TClone improves container fork latency by up to 4.9× compared to KVM and 3.4× compared to CRIU (Fig 2).
End-to-end CUA task latency reduces by 1.9× versus KVM and 1.5× versus CRIU on GTA benchmark and OSWorld tasks (Fig 2b).
Agent speculative branches share memory and filesystem state via copy-on-write, avoiding eager copying of full workspace state.
TClone reconstructs entire process trees inside isolated sibling containers with independent PID and network namespaces, preserving process hierarchy.
Filesystem versioning uses a snapshot-capable on-disk format plus lazy copy-on-write page-cache sharing to optimize memory and disk usage.
Network connections internal to the workspace are checkpointed; external connections are closed and require re-establishment to avoid ambiguous side effects.
Running the entire GUI stack (Wayland compositor and clients) inside each container captures live display, windows, and clipboard state for consistent forks.
Security profiles based on human syscall and file access traces restrict agent container privileges via seccomp and SELinux, reducing ambient authority risks.

Threat model

The adversary is a computer-use agent that executes actions within the user's personal workspace with permissions equivalent to the user. The adversary may produce harmful or erroneous commands that overwrite or expose sensitive data within the workspace, including files, GUI state, credentials, and processes. The adversary cannot break kernel-level isolation mechanisms, compromise container namespace boundaries, or bypass container security profiles enforced by seccomp and SELinux. TClone assumes the threat arises from misuse or confusion rather than a full compromise of the underlying OS kernel.

Methodology — deep read

Threat Model & Assumptions: The adversary is a potentially mistaken or malicious computer-use agent executing in the same personal workspace as the user, with inherited ambient authority including access to files, credentials, and applications. The adversary can perform malicious or erroneous actions that modify persistent state or leak secrets. The system assumes the adversary cannot break kernel-level isolation or security profile enforcement.
Data: Two agent benchmarking setups are used — AgentLoop running GTA benchmarks with 600+ tasks, and Agent S3 running OSWorld tasks. The data includes real application and GUI workloads, user-driven traces to generate security profiles, and measurement of system call and file accesses. The splits and preprocessing details are not fully specified but involve repeated agent task executions to measure latency and memory.
Architecture / Algorithm: TClone is a Linux kernel and CRIU extension enabling fast forking of full GUI desktop containers. It creates sibling containers with independent namespaces rather than parent-child forks, reconstructing the entire process tree inside the branch. Memory is shared copy-on-write across branches on a per-virtual memory area basis via a custom kernel module. File system branching uses a snapshot-capable filesystem layered with lazy copy-on-write page cache sharing for file-backed memory. Network namespace isolates internal TCP connections which are checkpointed; external connections are closed and require explicit re-establishment. GUI compositing (Wayland and clients) runs inside each container to capture GUI state fully. Asynchronous durable checkpointing serializes memory snapshots and filesystem at background without blocking the critical path.
Training Regime: Not applicable.
Evaluation Protocol: Latency and memory measurements are taken for workspace cloning, agent task end-to-end execution, and memory usage, comparing against baselines of KVM-based virtual machine cloning and CRIU container checkpoint/restore. Key metrics include fork latency, total agent task latency (including LLM model calls), CPU and memory overhead. Multiple tasks and agent trajectories from benchmarks are tested. No adversarial or distribution shift tests are described.
Reproducibility: A code release is planned upon acceptance but not yet available. The modification involves kernel and CRIU changes, and evaluation uses open benchmarks like OSWorld. Detailed parameter settings for kernel module, container configs, or checkpoint timing are not fully documented in the truncated version.

Concrete Example: To fork a workspace, TClone first freezes the source container briefly and records process hierarchies, thread states, namespaces, and memory metadata. It then creates a sibling container with fresh namespaces and replays the process tree inside it using a restorer process. Anonymous and file-backed memory regions are mapped copy-on-write using a kernel module that references frozen pages via pidfd. File system state uses lazy copy-on-write layered page caches to avoid eager data duplication. GUI stack (Wayland compositor and clients) runs inside the container capturing display and windows. Network connections internal to the container are checkpointed, but external connections are closed at fork time. Once the snapshot holders and filesystem are consistent, the source container resumes and durable checkpointing proceeds asynchronously in the background. This procedure enables the fast fork needed for speculative CUA execution with low latency.

Technical innovations

Separation of fast fork-time branch creation from slower durable checkpointing to reduce latency on the critical path.
Reconstruction of entire process trees inside independent sibling containers with consistent namespace and PID mappings using pidfd references.
Fine-grained copy-on-write sharing of anonymous memory and lazy copy-on-write sharing of file-backed memory page caches across container siblings.
Running the full GUI stack (Wayland compositor and clients) inside the container to capture and version GUI-local display and input state.
Network namespace isolation with TCP-repair for internal connections and explicit closure and re-establishment of external connections at fork time.

Datasets

GTA benchmark (AgentLoop) — 600+ computer-use tasks — publicly referenced
OSWorld (Agent S3) — multiple GUI application automation tasks — publicly referenced

Baselines vs proposed

KVM virtual machine cloning: workspace fork latency = 18.2s vs TClone = 3.7s (up to 4.9× improvement)
CRIU container checkpoint/restore: workspace fork latency = 12.8s vs TClone = 3.7s (up to 3.4× improvement)
End-to-end agent task latency with GPT-5.5 calls: KVM = 1002s vs TClone = 531s (1.9× speedup)
End-to-end agent task latency with GPT-5.5 calls: CRIU = 777s vs TClone = 531s (1.5× speedup)

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.17320.

Fig 3

Fig 3: Overview of TClone Personal Workspace Versioning.

Fig 4

Fig 4: TClone Workspace Fork Procedure. TClone parallelize snapshot, clone, and memory state persistency

Fig 5

Fig 5: Linux Process Fork vs. TClone Process-Tree Fork. Left: native fork() duplicates one process as a child of the source

Fig 6

Fig 6: Lazy CoW File-Cache Versioning. TClone forking address space 2 and 3 from the original address space 1.

Limitations

No adversarial robustness testing against actively malicious agents was shown; security guarantees are policy and kernel-enforced but not formally verified.
Evaluation mainly on OSWorld and GTA benchmarks—generalization to other GUI environments and workflows is unclear.
The system depends on kernel and CRIU modifications that may limit portability and require careful maintenance.
External side effects (e.g., network requests, sending emails) are not versioned or reversible; they remain policy boundaries.
The impact on user workloads with very large memory or GUI state was not detailed, and scalability limits were not fully explored.
Security profiles are derived from human traces that may not fully capture agent behaviors or tooling usage.

Open questions / follow-ons

How does TClone handle fully adversarial agents that actively attempt to subvert container isolation or escalate privileges?
Can TClone’s versioning and merge semantics be extended to incorporate external side effects in a reversible or auditable way?
Will the approach scale efficiently to extremely large workspaces with many GUI applications and heavy memory footprints?
What are the usability implications and integration challenges of TClone in typical end-user desktop environments?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, TClone’s approach of fast branching and rollback of full interactive user workspaces is highly relevant for building safer and more flexible AI agent deployments. Ensuring that automated agents can experiment speculatively without risk of corrupting persistent user state is analogous to preventing bots from tampering with core platform integrity. The separation of fast in-memory cloning from durable checkpointing can inspire low-latency, sandboxed environments for CAPTCHA verification or interaction analysis. Additionally, TClone’s use of fine-grained copy-on-write and containerized GUI stacks provides a model for isolating and auditing agent actions to detect malicious behavior or limit ambient authority. However, CAPTCHAs typically focus on challenge-response puzzles rather than continuous speculative branching, so the application would be more indirect, supporting backend agent sandboxing rather than client interaction. Still, techniques like container-level syscall filtering and namespace isolation align with defense-in-depth strategies for bot containment.

Cite

bibtex

@article{arxiv2605_17320,
  title={ TClone: Low-Latency Forking of Live GUI Environments for Computer-Use Agents },
  author={ Yutong Huang and Vikranth Srivatsa and Alex Asch and Hansin Tushar Patwa and Yiying Zhang },
  journal={arXiv preprint arXiv:2605.17320},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.17320}
}

TClone: Low-Latency Forking of Live GUI Environments for Computer-Use Agents ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​