Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

Source: arXiv:2605.06393 · Published 2026-05-07 · By Di Lu, Bo Zhang, Xiyuan Li, Yongzhi Liao, Xuewen Dong, Yulong Shen et al.

TL;DR

This paper addresses a structural security gap in self-hosted computer-use agents (SHCUAs) — systems like OpenClaw that combine LLM-driven planning with direct access to host-side resources including filesystems, shell execution, browsers, plugins, and external communication channels. The core problem is not that these agents are malicious, but that a legitimately deployed agent with broad operational reach can be repurposed by an adversary through prompt injection, malicious browser content, unsafe plugins, or host-side control-path tampering — without requiring the attacker to first compromise the host OS in the traditional sense. The authors argue that coarse blacklisting of dangerous commands is fundamentally insufficient because the security criticality of any operation depends jointly on action type, target object, execution context, and potential effect, making a flat rule-set unable to distinguish writing a workspace note from modifying /etc/passwd.

The paper's primary contribution is a two-part answer to this problem. First, it introduces an operation-centric formal security model that represents each SHCUA action as a five-tuple O = <s, a, r, c, e> (subject, action, object, context, effect) and defines four ordered risk projections over these dimensions, yielding a per-operation security-criticality level from a four-class ordered set L0–L3. Second, it proposes a risk-driven minimal-confinement architecture that keeps ordinary low-risk operations on the existing Rich Execution Environment (REE) path while elevating only security-critical classification, authorization, parameter binding, trusted evidence generation, and user-notification triggering into a TEE-backed trusted operation plane. The architecture is instantiated on OpenClaw using Intel TDX as the TEE backend, with remote terminal-side trusted components that verify TDX-audited commands before constrained local execution.

The evaluation demonstrates that the architecture can block unsafe or policy-disallowed operations before execution, preserve ordinary functionality for allowed workloads, and produce auditable cryptographically-backed evidence. Overhead is described as deployment-dependent — precise latency numbers in the truncated text are not fully visible, but the authors report that the trusted operation plane adds measurable but bounded overhead primarily for L2/L3 operations, while L0 operations incur no TEE-path overhead because they remain entirely on the REE path. The paper is positioned as a systems contribution rather than a pure ML contribution, with the security model and architecture being the primary artifacts.

Key findings

The operation-centric model classifies SHCUA operations into four security levels (L0–L3) using four risk projections — action sensitivity α(a), object criticality β(r), contextual risk γ(c), and effect severity δ(e) — each drawn from an ordinal scale {0,1,2,3}, enabling formally distinct enforcement decisions for superficially similar actions (e.g., write to /workspace/summary.txt scores v=(low,low,low,low)→L0→dree, while write to /etc/passwd scores v=<1,3,2,3>→L3→duc/ddeny).
Five distinct enforcement decisions are defined and mapped to risk levels: dree (direct REE execution) for L0, dia (isolated authorization + constrained execution) for L1, die (isolated execution) for L2, duc (isolated execution + user confirmation) for L3 with confirmation available, and ddeny (denial) for L3 without confirmation — making the enforcement policy formally expressible as a single decision function η(O)=Ψ(ρ(O),χ(O)).
The architecture deliberately confines only the minimal trusted operation plane inside the TEE (Intel TDX), not the full SHCUA stack, implementing a risk-driven minimal-confinement philosophy that preserves REE-side compatibility for ordinary workloads while protecting classification, authorization, binding, audit, and notification logic from REE-level compromise.
Trusted operation requests carry eight structured fields <sid, act, obj, scope, ctx, level, seq, ttl> including replay-resistance fields (seq, ttl), ensuring that an attacker who can manipulate the REE cannot reuse, replay, or silently modify a previously authorized request.
The system is reported to block unsafe or policy-disallowed operations before execution and to preserve ordinary functionality for allowed workloads in the experimental deployment on OpenClaw with Intel TDX, though precise per-operation latency breakdowns and false-positive/false-negative rates are not visible in the provided text excerpt.
Publicly reported real-world vulnerabilities in OpenClaw — local-instance hijacking via malicious websites ('ClawJacked'), one-click RCE from auth-token leakage, malicious skill poisoning in the skill marketplace, and malware delivery through fake installers — are cited as motivating evidence that the SHCUA abuse surface is already exploited in practice, not merely theoretical.
The paper explicitly scopes out physical attacks, direct TEE substrate compromise, and operations explicitly authorized by a legitimate user, focusing instead on the case where a legitimately deployed agent is steered toward unauthorized effects through its NLP and tool-use surface.

Threat model

The adversary's goal is to induce a legitimately deployed SHCUA to perform unsafe host-level operations over real system resources without necessarily first compromising the host OS. The adversary may operate through: malicious messages sent to the agent, browser-delivered content triggering indirect prompt injection, unsafe or poisoned skills/plugins installed from a marketplace, tampering with the host-side control path (e.g., altering operation parameters after authorization or forging audit evidence in the REE), and interference with the model dependency chain (poisoned model updates, manipulated inference services, untrusted model-side inputs). The adversary cannot: physically attack hardware, directly compromise the TDX TEE substrate, exploit side channels in the TEE implementation, or override operations that a legitimate authorized user has explicitly requested and policy permits. The key structural assumption is that the SHCUA already possesses broad operational reach by design, so the adversary need not implant a new capability — only redirect an existing one. The trust boundary is explicitly placed such that the REE is not trusted for security-critical classification, authorization, audit, or evidence generation, meaning the adversary is assumed capable of influencing or observing these REE-side operations.

Methodology — deep read

The threat model centers on an adversary who does not need to compromise the host OS to induce unsafe operations. Instead, the adversary can influence the SHCUA through malicious messages, browser-delivered content (indirect prompt injection), unsafe plugins or skills, tampering with the host-side control path, or poisoning the model dependency chain (e.g., manipulated inference services or poisoned model updates). The SHCUA is assumed to be deployed inside a constrained REE runtime domain that does not grant it all privileges directly — this constrained hosting is a prerequisite assumption, not something the paper provides. The adversary's goal is to have the agent execute a security-critical operation (file exfiltration, credential access, privileged config modification, command execution across trust boundaries) while the system's enforcement mechanisms either fail to classify it correctly or are themselves manipulated. The attacker cannot physically attack hardware, directly compromise the TDX substrate, or override operations explicitly authorized by a legitimate user.

The core security model is built around a five-tuple operation instance O = <s, a, r, c, e> over domains (subject, action, object, context, effect). Four risk projection functions — α:A→LA, β:R→LR, γ:C→LC, δ:E→LE — each map their respective dimension onto an ordinal level set {0,1,2,3}. These four projections are composed into a risk feature vector v(O)=<α(a),β(r),γ(c),δ(e)>. A deployment-dependent aggregation function Φ then maps this vector to a security-criticality level in L={L0,L1,L2,L3}. The paper does not specify a universal Φ — it explicitly states that concrete deployments may instantiate Φ using policy tables, rule sets, or risk classifiers, as long as it remains bound to the concrete operation instance. This is both a strength (flexibility) and a limitation (no universal calibration).

From the risk level, an enforcement decision function η(O)=Ψ(ρ(O),χ(O)) produces one of five decisions: dree, dia, die, duc, or ddeny, where χ(O) captures auxiliary policy conditions such as whether a TEE backend is available and whether user confirmation is reachable. The NeedIso predicate identifies whether an operation must pass through the trusted operation plane. Operations classified as L0 never enter the TEE path; L1–L3 operations do, with escalating enforcement stringency.

The system architecture separates the SHCUA runtime into a REE domain (application layer, privileged service layer, host resource layer, platform enforcement layer) and a trusted isolation domain containing only a minimal trusted operation plane. The REE-side operation-extraction layer captures action/object/context information from the SHCUA runtime's tool invocations and passes them to an operation dispatcher and request builder, which constructs a normalized trusted operation request A=<sid, act, obj, scope, ctx, level, seq, ttl>. This request is submitted to the trusted operation plane running inside TDX, which performs the O→ρ(O)→η(O) mapping under hardware-backed isolation. The scope field in A constrains what REE-side constrained executors are permitted to do with the operation, preventing post-authorization parameter substitution. Replay resistance is provided by (seq, ttl). For L3 operations, the trusted operation plane triggers user notification/confirmation before execution; for L2, it proceeds to isolated execution with trusted audit evidence; for L1, it performs isolated authorization with constrained REE execution; for L0, the operation stays on the ordinary REE path.

The prototype is implemented on OpenClaw with Intel TDX as the trusted backend. Remote terminal devices run local trusted components that verify TDX-audited commands before constrained local execution — these terminals are not full SHCUA hosts but verification endpoints. The experimental deployment topology (described in a figure caption referencing Intel hardware, though full details are truncated) instantiates this cloud-native architecture concretely. The evaluation covers three dimensions: (1) security analysis — whether unsafe or policy-disallowed operations are blocked pre-execution across representative SHCUA task scenarios; (2) ordinary functionality preservation — whether allowed workloads proceed without disruption; and (3) performance overhead — latency cost of routing operations through the trusted operation plane versus the direct REE path, described as deployment-dependent. Full latency tables and ablation details are in the truncated portion of the paper.

A concrete end-to-end example is explicitly provided in Section III.F: a document-management task where the SHCUA writes to /workspace/summary.txt is scored v(O1)=(low,low,low,low)→L0→dree and passes directly through the REE, while an attempted write to /etc/passwd is scored v(O2)=<1,3,2,3>→L3→duc or ddeny. Similarly, reading ~/.ssh/id_rsa is distinguished from reading a project note despite both being 'read' actions, because β(~/.ssh/id_rsa)=3 (critical object) and δ(e) reflects high confidentiality/externalization risk. Reproducibility: no code repository is mentioned in the provided text; the paper is an IEEE journal submission dated for 2026 and the arXiv version is from May 2026, suggesting it is pre-publication with unclear artifact availability.

Technical innovations

An operation-centric formal security model that represents each SHCUA operation as a five-tuple <s,a,r,c,e> and computes per-instance risk via four orthogonal projection functions, moving beyond prior flat-blacklist or tool-name-based approaches to agent confinement.
A risk-driven minimal-confinement architecture that isolates only the smallest necessary control-plane subset (classification, authorization, binding, audit, notification) inside a TEE rather than migrating the full agent stack, distinguishing it from prior TEE deployments that protect entire workloads or use TEEs only for data confidentiality.
A normalized trusted operation request format A=<sid,act,obj,scope,ctx,level,seq,ttl> that serves as a non-bypassable normalization boundary between REE-side analysis and TEE-side enforcement, with replay resistance and post-authorization parameter-binding guarantees that prevent scope-substitution attacks on the control path.
Application of Intel TDX (VM-level confidential computing) specifically to the SHCUA control plane with remote terminal-side verification of TDX-audited commands, extending the trusted boundary beyond the server to constrained endpoints without requiring full TEE deployment on terminal devices.
Formal definition of the NeedIso predicate that derives, directly from the enforcement decision function, which operations structurally require trusted isolation — providing a principled rather than ad hoc criterion for TEE invocation.

Baselines vs proposed

REE-only enforcement (no TEE): blocked unsafe operations before execution = not specified numerically in provided text vs proposed TEE-backed design: blocks unsafe/policy-disallowed operations pre-execution (qualitative claim, quantitative breakdown in truncated sections)
Direct REE execution path (L0 operations): overhead = baseline (no TEE invocation) vs proposed: same path, no added overhead for L0 operations
Full-stack TEE deployment (hypothetical blanket approach): overhead = higher resource cost (implied by authors' motivation for minimal-confinement) vs proposed minimal trusted operation plane: deployment-dependent overhead only on L1-L3 operation paths (precise ms values in truncated evaluation section)

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.06393.

Fig 3

Fig 3: Experimental deployment topology. OpenClaw runs on the Intel

Fig 2

Fig 2 (page 17).

Fig 3

Fig 3 (page 17).

Fig 4

Fig 4 (page 17).

Fig 5

Fig 5 (page 17).

Fig 6

Fig 6 (page 17).

Fig 7

Fig 7 (page 17).

Fig 8

Fig 8 (page 17).

Limitations

The aggregation function Φ that maps risk feature vectors to security levels is explicitly left deployment-dependent and unspecified universally — the paper provides no empirical calibration, ground-truth labeling methodology, or inter-rater reliability study for how real SHCUA operations should be scored on the {0,1,2,3} ordinal scales, leaving a critical gap between the formal model and any concrete instantiation.
No adversarial evaluation is visible in the provided text: the paper does not report whether an adaptive attacker who knows the scoring rubric (e.g., crafting prompts to keep β(r) and δ(e) scores low while achieving sensitive effects) can evade the model — a standard requirement for published confinement systems.
The evaluation relies on OpenClaw as a single SHCUA instantiation; generalizability to other agents (Claude Computer Use, Microsoft AutoGen, Operator-style systems) with different tool APIs, plugin architectures, and planning loops is not demonstrated.
The constrained REE runtime domain — a prerequisite assumption of the entire architecture — is assumed rather than constructed by this paper; if an operator deploys OpenClaw without proper sandboxing, the non-bypassability guarantee of the control path may not hold in practice.
Physical attacks, direct TEE substrate compromise, and side-channel attacks on TDX (which have been demonstrated in prior academic work) are explicitly out of scope, but the paper does not discuss the practical risk surface this exclusion creates for real cloud deployments where co-tenancy is common.
Performance overhead is described as 'deployment-dependent' without a clear worst-case bound visible in the provided excerpt, making it difficult for practitioners to assess feasibility for high-throughput agent deployments.

Open questions / follow-ons

How should the aggregation function Φ be calibrated across diverse SHCUA deployments and task domains — could a learned risk classifier trained on labeled operation traces replace or augment the policy-table approach, and how would such a classifier resist adversarial manipulation of its input features?
Multi-step operation chains where each individual step scores as L0/L1 but the combined effect is L3 (e.g., reading credential fragments across multiple benign-looking reads) are acknowledged as a contextual risk factor γ(c) but no concrete mechanism for tracking cumulative cross-step risk is specified — how would stateful cross-operation risk accounting be implemented without prohibitive TEE overhead?
The architecture assumes a cloud-native server hosting the SHCUA runtime with TDX available; how would the design degrade or need to be restructured for edge or consumer-device deployments where hardware TEEs are absent, weaker (ARM TrustZone), or expose larger attack surfaces?
Given that prompt injection is listed as a primary attack vector, and the operation-extraction layer on the REE side must parse and normalize SHCUA tool invocations before they reach the TEE, how robust is the extraction layer itself to adversarial inputs designed to confuse normalization and produce malformed or under-specified trusted operation requests?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, this paper is most directly relevant as a case study in the emerging threat of SHCUA-based bot traffic. As LLM-driven computer-use agents become capable of autonomous browser interaction, form submission, and multi-channel communication, the classical assumption that automated traffic is produced by purpose-built bots with static behavioral signatures is increasingly strained. An SHCUA performing a browser-automation task produces interaction patterns that may closely mimic human behavior at the application layer, since it uses real browser APIs, responds to page content dynamically, and can execute arbitrary JavaScript-visible interactions. The threat model in this paper — where a legitimate agent is steered by injected instructions toward abusive operations — maps directly onto scenarios where an enterprise-deployed agent is repurposed to conduct credential stuffing, scraping, or account creation at scale, potentially with a legitimate user's session context and device fingerprint.

From a defensive design standpoint, the operation-centric risk model offers a conceptual framework that bot-defense engineers could adapt for classifying browser sessions by the inferred intent and effect of the action sequence, not just by surface-level behavioral signals. The paper's insight that action type alone is insufficient — that the same 'click' or 'fill form' action has radically different risk profiles depending on target object, context, and effect — parallels the challenge of distinguishing benign automation from abusive automation in web traffic. However, practitioners should note that the paper's architecture requires instrumentation of the agent itself (operation extraction inside the trusted runtime), which is unavailable to a web service receiving inbound requests. Translating the model to a purely observer-side detection context would require inferring the equivalent of the five-tuple O from external behavioral signals, which is a non-trivial and open research problem.

Cite

bibtex

@article{arxiv2605_06393,
  title={ Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation },
  author={ Di Lu and Bo Zhang and Xiyuan Li and Yongzhi Liao and Xuewen Dong and Yulong Shen and Zhiquan Liu and Jianfeng Ma },
  journal={arXiv preprint arXiv:2605.06393},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.06393}
}

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​