Behavioral Biometrics for Automatic Detection of User Familiarity in VR

Source: arXiv:2510.12988 · Published 2025-10-14 · By Numan Zafar, Priyo Ranjan Kundu Prosun, Shafique Ahmad Chaudhry

TL;DR

This paper addresses the problem of automatically detecting a user's familiarity with virtual reality (VR) systems by analyzing behavioral biometrics during VR interactions. The authors focus on a passcode-based door unlocking task in VR, leveraging an interaction common in collaborative virtual spaces. They hypothesize that hand movement dynamics during this familiar yet VR-mediated task differ between novice and experienced VR users. The study involves 26 participants balanced by self-reported VR experience, who perform the passcode entry using both controller-based and hand-tracking input modalities.

The novelty lies in combining multi-modal VR input data (controller and hand-tracking) with state-of-the-art deep learning classifiers to classify user familiarity, including cross-device generalization challenges. The experimental results show that hand-tracking data yields higher classification accuracy (up to 92.05%) than controller data (up to 83.42%), and combining modalities further improves accuracy to 94.19%. Cross-device testing—training on controller data and testing on hand-tracking—still achieves 78.89%, demonstrating promising generalization. This work is among the first to automatically infer VR familiarity from behavioral biometrics, enabling adaptive VR system responses to individual user experience levels to improve usability and reduce frustration.

Key findings

Hand-tracking input modality classification accuracy: up to 92.05% (PIN 3197, window size 90 frames) using InceptionTime model.
Controller-based input modality accuracy: up to 83.42% (PIN 2648, window size 100) with InceptionTime.
Cross-device classification (train on controller, test on hand tracking) achieves 78.89% accuracy (PIN 2468, window size 110) with FCN.
Mixed-device classification (combine controller and hand-tracking in training) achieves 94.19% accuracy (PIN 3197, window size 120) using FCN model.
Longer sliding window sizes (up to 120 frames) generally improve classifier performance, indicating rich motion context aids familiarity detection.
PIN complexity affects classification accuracy, with diagonally complex PINs like 2648 yielding higher accuracy than line-based PINs like 1379.
MLP, FCN, and InceptionTime classifiers outperform other tested architectures; FCN often delivers the most robust results across conditions.
Equal participant split between 13 experienced and 13 inexperienced VR users; total of 2080 trials collected across modalities.

Threat model

The adversary is a passive observer with access to motion trajectory data from VR interactions (controller or hand tracking) aiming to infer a user’s VR experience level (novice vs experienced). The adversary cannot manipulate the user’s input devices or the VR environment but tries to classify familiarity solely from behavioral biometrics during the passcode entry task. The model does not assume attacker access to internal VR system states or credentials.

Methodology — deep read

The threat model assumes an adversary aiming to infer whether a VR user is experienced or novice based on their behavioral biometrics captured during passcode entry interactions; the adversary cannot directly observe internal states or credentials, only motion data from VR devices.

Data were collected from 26 participants (13 experienced, 13 inexperienced) using a Meta Quest Pro headset. Each participant performed a virtual door unlocking task by entering one of four four-digit PIN codes (1379, 2468, 2648, 3197). Each PIN was entered 10 times per input modality (hand tracking and controller), resulting in 40 trials per modality per participant, and a total of 2080 trials. The study was conducted in two sessions separated by one month, balancing the order of input modalities for participants. 3D positional and orientation data were recorded at 72 frames per second.

The core classifiers evaluated included Multilayer Perceptron (MLP), Fully Convolutional Network (FCN), and InceptionTime architectures. Input windows of motion data were extracted with frame lengths varying from 50 to 120 frames and step size one. Models map sliding windows of 3D dominant hand positional data to a binary label indicating user VR experience (inexperienced or experienced).

Training used cross-entropy loss with label smoothing and the Adam optimizer. Experiments involved training/testing splits at the participant level (80% for training, 20% for testing) to ensure evaluation on unseen users. Four classification scenarios were studied: controller-only training/testing, hand-tracking-only, cross-device (train controller, test hand tracking), and mixed-device (train and test on both modalities combined).

Models were trained on an NVIDIA RTX 4090 GPU for 1000 epochs. Evaluation metrics included accuracy and area under the ROC curve (AUC). PIN-based and window length ablations were performed to analyze performance variability. The methodology supports generalization by assessing cross-modality transfer.

As a concrete example, for PIN 3197 with hand-tracking data at 90-frame windows, the InceptionTime model achieves 92.05% accuracy in distinguishing experienced vs inexperienced VR users based purely on hand movement trajectories during PIN entry. Cross-validation ensured no participant data in training were leaked to testing.

Code or detailed reproducibility artifacts were not explicitly stated as released. Dataset access is not public but may be available upon request. The study is limited to short-term sessions without evaluation on longitudinal stability or wider demographics.

Technical innovations

First study to automatically detect VR user familiarity across multi-modal input (controller and hand tracking) using behavioral biometrics.
Integration of sliding window techniques with deep classifiers (MLP, FCN, InceptionTime) to capture temporal hand movement dynamics for familiarity classification.
Cross-device generalization evaluation where models trained on controller data classify hand-tracking data without retraining.
Demonstrated that combining modalities in training significantly improves familiarity detection accuracy, surpassing single-modality approaches.
Application of label smoothing during training to improve robustness in VR familiarity classification.

Datasets

VR door unlocking dataset — 2080 trials — collected from 26 participants (13 experienced, 13 inexperienced), using Meta Quest Pro device, non-public

Baselines vs proposed

Controller-only input, InceptionTime model: accuracy = 83.42% (PIN 2648, WS 100) vs Hand-tracking input: 92.05% (PIN 3197, WS 90)
Cross-device classification (train controller, test hand tracking) FCN model: 78.89% accuracy (PIN 2468, WS 110) vs controller-only 83.42%
Mixed-device classification (combined input) FCN model: 94.19% accuracy (PIN 3197, WS 120) vs controller-only 83.42% and hand-tracking-only 92.05%
MLP and InceptionTime models exhibit lower accuracy than FCN in mixed-device scenario (approximately 75-88% vs 94.19%).

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2510.12988.

Fig 1

Fig 1: Participants interacting with a VR door unlocking application by

Fig 2

Fig 2: Ten trial right-hand movement trajectories for the four passcodes using hand tracking-based interaction, comparing participants with no VR experience

Fig 3

Fig 3: Ten trial right-hand movement trajectories for the four passcodes using controller-based interaction

Limitations

Small sample size (26 participants) limits generalizability across wider demographics or VR experience levels.
Short-term sessions; no longitudinal evaluation of familiarity detection stability over time or repeated use.
Dataset collected only on one VR device (Meta Quest Pro), restricting device diversity and cross-hardware applicability.
Study constrained to a single specific VR task (passcode-based door entry); other VR tasks or interactions not evaluated.
No adversarial analysis to test robustness of classifiers against spoofing or mimicry in familiarity detection.
Lack of publicly available dataset or code hampers reproducibility and external validation.

Open questions / follow-ons

How stable and reliable is user familiarity detection longitudinally as users gain VR experience over days or weeks?
Can familiarity detection generalize to more complex or varied VR tasks involving bimanual and head/eye movement features?
How does demographic diversity (age, physical ability) impact movement biometrics and familiarity classification accuracy?
What are the effects of adversarial attempts to spoof or obfuscate familiarity signals in VR behavioral biometrics?

Why it matters for bot defense

This research offers useful insights for bot-defense and CAPTCHA practitioners interested in behavioral biometrics in immersive environments. The approach to leveraging nuanced hand movement patterns to infer prior user experience could inspire novel CAPTCHAs or verification challenges tailored for VR systems. Moreover, understanding cross-device behavioral generalization informs designing more robust verification mechanisms that remain effective across multiple VR input modalities.

For bot-defense, automatically detecting user familiarity enables adaptive difficulty adjustments in CAPTCHAs or interaction flows in VR, reducing friction for experienced users while maintaining security against automated or novice attacks. However, the small dataset and limited task variety warrant caution before deploying such classifiers broadly. This work lays groundwork toward biometric-based user profiling in VR that could complement classical CAPTCHAs, especially as immersive technologies become more widespread.

Cite

bibtex

@article{arxiv2510_12988,
  title={ Behavioral Biometrics for Automatic Detection of User Familiarity in VR },
  author={ Numan Zafar and Priyo Ranjan Kundu Prosun and Shafique Ahmad Chaudhry },
  journal={arXiv preprint arXiv:2510.12988},
  year={ 2025 },
  url={https://arxiv.org/abs/2510.12988}
}

Behavioral Biometrics for Automatic Detection of User Familiarity in VR ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​