ThermoCAPTCHA: Privacy-Preserving Human Verification with Farm-Resistant Traceable Tokens
Source: arXiv:2603.05915 · Published 2026-03-06 · By Shovon Paul, Md Imran Hossen, Xiali Hei
TL;DR
This paper addresses longstanding limitations of CAPTCHAs related to usability, privacy, and resistance to outsourcing by CAPTCHA farms. Existing puzzle- and behavior-based CAPTCHAs impose cognitive burdens, collect fine-grained behavioral data that raises privacy issues, and are vulnerable to remote human-solver farms that forward verification tokens across distinct client environments. To overcome these issues, ThermoCAPTCHA introduces a novel human verification scheme utilising real-time thermal imaging. By capturing a single thermal image and detecting live human heat signatures with a lightweight YOLOv4-tiny model, it enables near-instant human presence verification without puzzles or behavioral profiling. Additionally, a cryptographically bound traceable token mechanism restricts token reuse or forwarding, effectively blocking CAPTCHA farms.
The prototype demonstrates high detection accuracy (96.70%) with low latency (73.60 ms) on commodity hardware. Extensive security evaluations show robustness against man-in-the-middle, replay, spoofing with heated objects or mannequins, and adversarial perturbations. A user study with 50 participants, including visually impaired individuals, confirms improved accuracy, speed, and accessibility over reCAPTCHA v2. This work makes important strides in providing a privacy-preserving, accessible, farm-resistant CAPTCHA alternative.
Key findings
- ThermoCAPTCHA achieves 96.70% human detection accuracy on a test set of 120 images captured at 3 ft distance under controlled conditions (Table 1).
- The YOLOv4-tiny model processes thermal images with an average end-to-end verification latency of 73.60 ms on a low-power Intel i5-8550U CPU with MX150 GPU.
- Detection confidence peaks at frontal (90°) angles with mean score 0.91 ± 0.07 and degrades to 0.79 ± 0.118 at 130° horizontal angles (Table 2).
- ThermoCAPTCHA’s cryptographically bound tokens incorporate session ID and device fingerprint to prevent reuse or forwarding, closing a major CAPTCHA farm vulnerability.
- Security evaluation demonstrates resistance to man-in-the-middle manipulation, replay attacks, thermal spoofing using heated objects or mannequins, and adversarial perturbations.
- User study with 50 participants (including 20 visually impaired) shows ThermoCAPTCHA outperforms reCAPTCHA v2 in accuracy, completion time, and perceived usability.
- Thermal imaging preserves user privacy by encoding only coarse heat distribution patterns, avoiding biometric identification risks of RGB or behavioral signals.
Threat model
Adversaries aim to bypass ThermoCAPTCHA by presenting fake or manipulated thermal images (e.g., heated objects, replayed thermal captures, adversarial perturbations), relaying CAPTCHA verification tokens obtained from remote human solvers or malware (CAPTCHA farms), and attempting man-in-the-middle token manipulation. They cannot break cryptographic signatures, access server-side secret keys, or execute code within the verified client environment with SRI and permission protections in place. The model assumes bounded clock skew between website and CAPTCHA server. Attackers also lack the ability to compromise the secure backend or the cryptographic token binding.
Methodology — deep read
The authors start by defining a threat model focused on adversaries who might try to manipulate thermal image input, replay verification tokens, or exploit weak token binding to bypass human verification. They assume benign client-side execution with Subresource Integrity (SRI) protection on the ThermoCAPTCHA JavaScript and native browser thermal camera permission enforcement. The server is assumed secure with encrypted storage, TLS communication, and controlled key management. The attacker cannot forge tokens cryptographically bound to a user session and device.
Data collection involved 286 thermal images from 26 participants, each contributing images at multiple horizontal viewing angles (50°-130°) and vertical tilts (±10°) to capture realistic pose variations. Images were collected using a FLIR Lepton 500-0771-01 thermal sensor paired with OpenMV Cam H7 R2 microcontroller for high resolution, and a PureThermal 2 module for real-time capture prototype. The collected dataset was augmented by generating 20 variants per image using standard photometric and geometric transforms such as rotation (±40°), scaling, shear, shifts, and flips, resulting in 3,520 images to train the detection model.
The core detection algorithm is a modified YOLOv4-tiny single-class detector adapted to identify human heat signatures in thermal images. They set filters for one class and trained bounding box prediction heads accordingly. Inputs are 416x416 pixel normalized thermal images. The model is trained for 40,000 iterations with a batch size of 64 using standard loss and optimization settings, achieving a final loss of 0.0509 and 96.70% accuracy on held-out tests.
The verification protocol replaces behavioral CAPTCHAs by capturing a single thermal frame when the human verification trigger fires (e.g., form submit). The client appends a freshly generated nonce and timestamp to the image metadata, produces a SHA256 hash of this binary object, and signs it using the client’s private RSA key. This payload (image, nonce, timestamp, signature, and public key) is sent to the ThermoCAPTCHA server. The server verifies freshness and uniqueness of the nonce to prevent replay, checks the signature, and runs the YOLOv4-tiny model to confirm human presence.
Upon success, the server issues a cryptographically bound JWT token encrypted with both the CAPTCHA server private key and the protected website’s shared key. The token embeds a sessionID, device fingerprint, nonce, and expiration timestamp. Subsequent protected actions require the website to send the token back to the ThermoCAPTCHA server for validation, blocking forwarding of tokens to other sessions or devices.
Security evaluations tested robustness against adversarial attempts including MITM manipulation, replay attacks, physical spoofing with heated objects or mannequins, and adversarial perturbations to the thermal frames. Real-world usability was assessed with 50 participants, including those with visual impairments, comparing accuracy, time, and subjective usability against Google’s reCAPTCHA v2.
Reproducibility materials such as the dataset, training scripts, code, and questionnaires are provided in an appendix, supporting replication efforts.
Technical innovations
- Use of real-time thermal imaging combined with a lightweight YOLOv4-tiny model to detect live human presence for CAPTCHA verification, eliminating puzzles and behavioral profiling.
- Introduction of cryptographically bound, traceable verification tokens embedding session and device context to prevent CAPTCHA farm token forwarding and reuse.
- Combining digital signatures over thermal image hashes plus nonce/timestamp metadata to ensure freshness and integrity of verification inputs without encrypting privacy-preserving thermal frames.
- Demonstration that coarse thermal heat signatures provide effective liveness detection with high accuracy and robustness against spoofing and adversarial perturbations.
Datasets
- ThermoCAPTCHA thermal image dataset — 286 images from 26 participants collected using FLIR Lepton 500 thermal sensor — custom, non-public but released as appendix materials
Baselines vs proposed
- reCAPTCHA v2: lower human verification accuracy and longer completion times compared to ThermoCAPTCHA (quantitative values unspecified)
- YOLOv4-tiny detector baseline on thermal dataset: 96.70% accuracy vs ThermoCAPTCHA deployed model at same accuracy
- Latency: ThermoCAPTCHA 73.60 ms verification latency on low-powered i5 CPU vs modern CAPTCHAs with higher or unspecified latency
Figures from the paper
Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2603.05915.

Fig 1: High-level workflow of a modern CAPTCHA [39].

Fig 2 (page 2).

Fig 3 (page 2).

Fig 4 (page 2).

Fig 2: ThermoCAPTCHA system overview.

Fig 4: Thermal samples collected under diverse background conditions, including distant subjects, direct sunlight, heat-

Fig 5: Printed human image and heated mannequin com-

Fig 6: Examples of non-human thermal sources used to evaluate robustness to misuse and incidental hot objects. None of
Limitations
- Training and testing dataset limited to 286 images from 26 participants, which is small by deep learning standards and may limit generalization to broader population or environmental variation.
- Thermal imaging requires specialized hardware (thermal camera modules), limiting immediate deployment on commodity devices lacking thermal sensors.
- Robustness tested only against a subset of spoofing attacks (heated objects and mannequins) and adversarial perturbations; future unforeseen adversarial strategies may exist.
- Replay and MITM protections dependent on accurate time synchronization and nonce management between website and CAPTCHA server; misalignment could cause false rejects.
- User usability study sample size of 50 (including 20 visually impaired) is moderate; larger scale diverse demographic studies are needed to confirm accessibility benefits.
Open questions / follow-ons
- How well does ThermoCAPTCHA generalize to diverse environmental conditions (outdoor lighting, extreme temperatures) beyond the collected dataset?
- Can thermal imaging-based CAPTCHAs be implemented efficiently on mobile devices or integrated with existing web cameras lacking dedicated thermal sensors?
- What are the long-term privacy implications of collecting thermal data at scale, especially considering evolving biometric research on thermal signatures?
- How resistant is the system to future, more sophisticated adversarial machine learning attacks specifically targeting thermal imaging?
Why it matters for bot defense
For bot-defense practitioners, ThermoCAPTCHA offers a promising new verification modality that eliminates the cognitive and privacy drawbacks of puzzle-based and behavioral CAPTCHAs. By leveraging thermal imaging, it sidesteps risks from biometric tracking inherent in RGB or behavioral signals and improves accessibility, particularly for visually impaired users. Importantly, its cryptographically bound tokens address a critical gap in current CAPTCHA farm defenses by tightly linking verification tokens to session and device context, mitigating token forwarding attacks that render many existing CAPTCHAs ineffective.
Implementing this approach requires integrating thermal sensors and modifying verification protocols, which may be feasible for high-value or sensitive web actions with elevated attack risk. However, evaluation under broader deployment scenarios and environmental conditions is needed to understand operational constraints. Overall, ThermoCAPTCHA introduces an important architectural advance in CAPTCHA design that could shift future defensive strategies beyond interaction biometrics and puzzles.
Cite
@article{arxiv2603_05915,
title={ ThermoCAPTCHA: Privacy-Preserving Human Verification with Farm-Resistant Traceable Tokens },
author={ Shovon Paul and Md Imran Hossen and Xiali Hei },
journal={arXiv preprint arXiv:2603.05915},
year={ 2026 },
url={https://arxiv.org/abs/2603.05915}
}