BitC-3DGS: High-Capacity 3D Gaussian Splatting Watermarking via Bit Compression
Source: arXiv:2605.29583 · Published 2026-05-28 · By Yuquan Bi, Baosheng Yu, Yingke Lei, Jianwei Yang, Hongsong Wang, Jie Gui et al.
TL;DR
This paper addresses the problem of embedding high-capacity watermarks into 3D Gaussian Splatting (3DGS) assets to enable robust ownership and provenance verification in large-scale 3D content pipelines. Prior semantic watermarking methods using pre-trained CLIP text encoders are limited to 77-bit messages due to CLIP's fixed token context length. The key innovation is BitC-3DGS, a framework that compresses multiple bits into each semantic token, overcoming this limit and enabling watermark messages beyond 77 bits. To decode compressed messages, a novel dual-branch decoder jointly predicts the chunk-level token indices and the bit-level representation, improving message recovery accuracy. Additionally, a hard-message sampling strategy progressively prioritizes difficult bit patterns during decoder training to mitigate fixed-subset bias and improve generalization. Experiments on synthetic Blender and real LLFF 3D datasets demonstrate that BitC-3DGS can embed and recover 128-bit watermarks with accuracy comparable to 64-bit messages in prior work while maintaining high rendering fidelity (PSNR, SSIM, LPIPS). The approach outperforms state-of-the-art 3DGS watermarking baselines in capacity, robustness to visual and geometric distortions, and recovery of unseen messages.
Key findings
- BitC-3DGS supports 128-bit watermark capacity with decoding accuracy (~91.4%) comparable to prior methods’ 64-bit (~95.07%) performance (Table II).
- At 64 bits, BitC-3DGS achieves 97.9% bit accuracy [In] vs 95.07% for GuardSplat baseline (Table I).
- Hard-message sampling reduces the performance gap between seen and unseen messages from 4.29% to 1.18% at 64 bits, demonstrating improved generalization (Table IV).
- Bit-compressed tokenization groups multiple bits per token, increasing payload under the fixed 77-token CLIP context length.
- Dual-branch decoder jointly optimizes chunk-level cross-entropy loss and bit-level BCE loss, improving decoding reliability.
- BitC-3DGS maintains high rendering quality—e.g., at 128 bits, PSNR increases from 33.01 to 34.32 and LPIPS decreases from 0.0099 to 0.0068 vs. baseline (Table II).
- Robustness tests under seven 2D perturbations and three 3D geometric attacks show BitC-3DGS consistently outperforms previous methods, maintaining >94% decoding accuracy under combined distortions (Table III).
- BitC-3DGS outperforms a strong baseline of GuardSplat augmented with bit compression, confirming the necessity of both compression and specialized decoding.
Threat model
The adversary is assumed to have access to rendered 2D views of the watermarked 3DGS asset but not to the internal 3D representation or embedding secret. They may attempt common visual or geometric distortions (cropping, noise, pruning), but cannot remove or overwrite the watermark without degrading rendering quality. The watermark must be recoverable from arbitrary viewpoints and robust to such perturbations. The adversary cannot manipulate the pretrained CLIP encoder or decoder used for embedding and extraction.
Methodology — deep read
The paper begins by defining the threat model of embedding high-capacity watermarks into 3D Gaussian Splatting (3DGS) representations such that ownership information can be reliably extracted from rendered views, even under distortions and transformations in 2D and 3D.
Data consists of two benchmark datasets: Blender (8 synthetic objects with controlled camera trajectories) and LLFF (7 real-world scenes with handheld captures). For each, models are evaluated on 200 held-out views using standard splits, measuring bit recovery accuracy, rendering fidelity (PSNR, SSIM, LPIPS), and robustness to various attacks.
The core architecture includes a bit-compressed tokenization scheme: a binary message of length L is split into C chunks of n bits, each chunk mapped to one of 2^n semantic tokens from the CLIP vocabulary via a deterministic position-aware lookup table. This compresses multiple bits per token, overcoming CLIP's 77-token limit. The token sequence is zero-padded to 77 tokens for encoding.
To decode, a dual-branch decoder operates on normalized 512-D CLIP text embeddings. The chunk branch predicts discrete chunk indices via a transformer encoder and cross-entropy loss. The bit branch directly outputs bit logits, refined via gated self-attention and optimized via binary cross-entropy loss. Losses from chunk, projected-bit (marginalized from chunk probs), and direct bit predictions are combined with weighted sum.
Decoder training employs a hard-message sampling strategy to overcome message subset bias and improve generalization. A buffer of K messages is maintained and updated each epoch, replacing easy samples with hard (low-accuracy) or unseen messages according to a schedule that linearly increases the fraction of hard samples until freezing the buffer.
After decoder training, the frozen decoder supervises multi-view watermark embedding via optimizing scene parameters of 3DGS using rendered views under random perturbations. The embedding loss combines message bit loss, image fidelity loss (RGB reconstruction + LPIPS), and an offset regularization term with carefully balanced weights.
Training schedules vary by message length: shorter messages train fewer epochs with earlier freezing of sampling buffer, longer ones train more epochs with delayed freezing. Adam optimizer with learning rates and weight decay as specified is used. Experiments run on NVIDIA RTX 4090 GPUs.
Evaluation protocols include bit recovery accuracy on messages from the training set (In), unseen message space (Out), and their random mix, over multiple capacities from 16 to 128 bits. Robustness is tested with 2D distortions (noise, rotation, blur, crop, brightness, JPEG, combined) and 3D perturbations (Gaussian noise on SH coefficients, pruning, cloning of Gaussians).
Baseline comparisons include GaussianMarker, 3D-GSW, 3DGS+WateRF, 3DGS+StegaNeRF, GuardSplat, and GuardSplat with bit-compressed tokenizer. Ablations analyze the effect of bit compression, dual-branch decoding, and hard-message sampling on accuracy and generalization.
A concrete example: for 64-bit message embedding, messages are chunked into n=2 bit chunks, tokenized into 77 tokens (padded), encoded by CLIP text encoder into 512-D embeddings, decoded by the dual-branch decoder predicting chunk indices and bits simultaneously. Decoder training dynamically updates its message buffer to emphasize hard samples. The trained decoder supervises optimization of Gaussian splatting parameters to embed the watermark. Refinement is performed with visual distortions. Final rendering yields high-fidelity images with recoverable 64-bit messages at ~97.9% accuracy on random samples.
Technical innovations
- A novel bit-compressed tokenization scheme that encodes multiple bits into a single semantic token, overcoming the fixed 77-token context limit of CLIP encoding.
- A dual-branch decoder architecture jointly predicting chunk-level discrete tokens and flat bit-level predictions to improve decoding reliability under compressed encoding.
- A hard-message sampling strategy that updates the training message buffer to prioritize historically difficult messages and introduce unseen ones, mitigating fixed-subset training bias and improving generalization.
- Integration of the above into a unified framework enabling high-capacity watermarking (up to 128 bits) of 3D Gaussian Splatting assets with robust decoding and preserved rendering fidelity.
Datasets
- Blender dataset — 8 synthetic 3D objects — public benchmark
- LLFF dataset — 7 real-world forward-facing scenes — public benchmark
Baselines vs proposed
- GaussianMarker: bit accuracy = 91.69% at 64 bits vs BitC-3DGS 97.35% (Random) / 97.90% (In)
- 3DGS+StegaNeRF: 85.27% at 64 bits vs BitC-3DGS 97.35%
- 3D-GSW: 90.45% at 64 bits vs BitC-3DGS 97.35%
- GuardSplat [7] (Random): 92.93% at 64 bits vs BitC-3DGS 97.35%
- GuardSplat [7] (In): 95.07% at 64 bits vs BitC-3DGS 97.90%
- GuardSplat + bit-compressed tokenizer: 88.41% at 96 bits vs BitC-3DGS 93.97% (Random), 84.45% at 128 bits vs 90.59%
- BitC-3DGS improves PSNR at 128 bits (In) from 33.01 to 34.32 and reduces LPIPS from 0.0099 to 0.0068 compared to GuardSplat + tokenizer
Figures from the paper
Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.29583.

Fig 2: Overview of the proposed BitC-3DGS framework. The method contains two stages. Stage I (decoder pre-training): Messages sampled from the

Fig 3: Illustration of the bit branch in the proposed dual-branch decoder.

Fig 3 (page 3).

Fig 4 (page 3).

Fig 5 (page 3).

Fig 4: Visual comparisons with baselines [5]–[7] under L = 48 bits on the Blender [1] dataset and LLFF [53] dataset. For each setting, the left image

Fig 5: Qualitative results of BitC-3DGS under 96-bit and 128-bit payloads

Fig 6: Qualitative results of BitC-3DGS under 96-bit and 128-bit payloads
Limitations
- Decoder pre-training relies on a fixed-size message buffer with hard-sample prioritization but cannot fully cover all message combinations for very large payloads.
- Robustness evaluation includes common 2D/3D distortions but not targeted adaptive adversarial watermark removal or attacks.
- The approach depends on CLIP's fixed 77-token embedding limit, so further scaling beyond 128 bits might require additional token compression or a different backbone.
- Training and evaluation are performed on Blender and LLFF datasets; results might vary for more diverse or complex 3D scenes.
- The dual-branch decoder adds architectural complexity and training overhead compared to simpler single-branch decoders.
- The watermark embedding process requires multi-view rendering optimization, which could be computationally costly for very large datasets.
Open questions / follow-ons
- How can the bit compression scheme scale beyond 128 bits while maintaining decoding accuracy?
- Can more advanced adversarial training or robustness methods improve resistance to intentional watermark removal attacks?
- Is it possible to integrate watermark embedding directly into 3DGS training to reduce optimization overhead?
- Could alternative pretrained language or vision models with larger context windows replace CLIP to relax token limits?
Why it matters for bot defense
For bot-defense and CAPTCHA engineering, this work illustrates how to overcome hard fixed-capacity limits in semantic embedding-based hidden channel designs by employing bit-level compression combined with dedicated decoding architectures and balanced sample coverage during training. This can inspire watermark or hidden challenge embedding approaches involving high-capacity messages within restricted token budgets or embedding windows, such as leveraging pre-trained encoders like CLIP. The dual-branch decoder and hard-message sampling offer novel strategies for reliable recovery of compressed encoded information in adversarial or distribution-shifted environments. However, applying bit-compressed tokenization requires careful design of token-to-bit mappings and decoders, so practitioners should assess the decoding complexity and training costs against the desired capacity gains. Overall, BitC-3DGS presents a promising direction for robust high-capacity hidden messaging in visually rich 3D data representations.
Cite
@article{arxiv2605_29583,
title={ BitC-3DGS: High-Capacity 3D Gaussian Splatting Watermarking via Bit Compression },
author={ Yuquan Bi and Baosheng Yu and Yingke Lei and Jianwei Yang and Hongsong Wang and Jie Gui and Yuan Yan Tang and James Tin-Yau Kwok },
journal={arXiv preprint arXiv:2605.29583},
year={ 2026 },
url={https://arxiv.org/abs/2605.29583}
}