Optimizing Mouse Dynamics for User Authentication by Machine Learning: Addressing Data Sufficiency, Accuracy-Practicality Trade-off, and Model Performance Challenges

Source: arXiv:2504.21415 · Published 2025-04-30 · By Yi Wang, Chengyv Wu, Yang Liao, Maowei You

TL;DR

This paper addresses key challenges in mouse dynamics-based user authentication, namely determining the sufficient data volume, balancing accuracy and practicality with segment length, and enhancing model performance through improved temporal feature extraction. The authors propose a systematic statistical method using Gaussian Kernel Density Estimation (KDE) combined with Kullback-Leibler (KL) divergence to estimate the minimal necessary data volume for reliable model training, avoiding redundant or insufficient data. They introduce the Mouse Authentication Unit (MAU) concept, optimized via Approximate Entropy (ApEn) analysis to find segment lengths that maximize discriminative information while minimizing latency. To capture both local velocity patterns and long-term temporal dependencies, the Local-Time Mouse Authentication (LT-AMouse) framework integrates a 1D-ResNet for local feature extraction with a GRU recurrent network for sequence modeling. Experiments on two mouse dynamics datasets—Balabit (daily usage) and DFL (laboratory)—show the proposed method can reduce training data volume by up to a factor of 10 for DFL, and achieve strong authentication performance with blind attack AUCs of 98.52% (DFL) and 94.65% (Balabit), outperforming previous state-of-the-art. Overall, the paper offers practical guidance on data collection, segment sizing, and model design to improve the feasibility and reliability of mouse dynamics authentication.

Key findings

Using KDE and KL divergence, the optimal data volume for individual users was reduced from 114,114 to 73,779 mouse velocity samples on Balabit and by a factor of 10 (from 6,726,000 to 691,000) on the DFL dataset.
The Approximate Entropy (ApEn) analysis showed that authentication accuracy increases with longer MAU length but with diminishing returns when the absolute slope of entropy drop is ≤1×10⁻⁴; optimal MAU lengths lie between 90-130 for Balabit and 110-160 for DFL.
LT-AMouse model combining 1D-ResNet and GRU achieved AUC of 98.52% on DFL and 94.65% on Balabit datasets under imbalanced training sample ratios (8:1 and 5:1 respectively), surpassing previous models.
The model maintained robustness under a blind attack setting by including unseen users in the test set.
Short MAU lengths (e.g., 10-50 segments) led to significant drops in accuracy (AUC and EER), indicating insufficient behavioral information.
Mouse velocity alone was used as input feature to improve privacy and reduce dimensionality, contrasted with prior works using more complex multi-modal features.
The KDE density curves stabilized as data volume increased, evidenced by converging KL divergence values below 1×10⁻⁴, validating the data sufficiency criterion.
Trade-off between authentication speed and accuracy can be tuned by selecting MAU length based on ApEn slope analysis.

Threat model

The adversary is an unauthorized user attempting to bypass authentication by imitating or forging mouse dynamics behavior. The attacker does not have perfect knowledge or mimicry capability of legitimate users’ detailed mouse velocity sequences, and blind attacks using unseen users simulate opportunistic adversaries. The system assumes mouse velocity data is reliably collected and not adversarially perturbed in real-time. More sophisticated adversarial settings like impersonation under mimicry or poisoning are out of scope.

Methodology — deep read

Threat Model & Assumptions: The adversary is any unauthorized user attempting to mimic or impersonate an authorized user’s mouse behavior. The system relies solely on mouse movement velocity sequences without additional biometrics, assuming attacks do not perfectly replicate user mouse dynamics over sufficient data volumes. Blind attacks are modeled by testing on unseen users.
Data: The study uses two publicly available mouse dynamics datasets: Balabit (10 users, daily usage data) and DFL (21 users, laboratory-controlled). Each user’s mouse movement data is segmented into multiple CSV sessions with timestamps, cursor coordinates, clicks, and movement states. Mouse velocity sequences are computed by Euclidean distance between adjacent points divided by fixed time intervals. Training and testing datasets have imbalanced sample ratios (8:1 positive to negative for DFL, 5:1 for Balabit). Additional unseen users are held out for robustness evaluation.
Architecture & Algorithm: The LT-AMouse model takes fixed-length 1D mouse velocity segments (the Mouse Authentication Unit, MAU) as input. First, a 1D-CNN ResNet module extracts local velocity features while preserving temporal sequence length via residual connections. These outputs feed into a Gated Recurrent Unit (GRU) network which models longer-range temporal dependencies. The final hidden state vector is classified through a fully connected layer with Softmax activation, optimized for binary user verification.
Training Regime: The model is trained as a binary classification problem (legitimate user vs others) with cross-entropy loss. Adam optimizer is used with default parameters (β1=0.9, β2=0.999). The study does not specify exact epochs or batch size but emphasizes hyperparameter consistency across baselines for fair comparison. Data augmentation or seed strategies are not detailed.
Evaluation Protocol: Performance metrics include Area Under ROC Curve (AUC), Equal Error Rate (EER), F1 score for imbalanced classes, and Defense Success Rate (DSR) against blind attacks from unseen users. KDE and KL divergence guide data volume sufficiency estimation. Approximate Entropy (ApEn) analysis determines optimal MAU segment length balancing accuracy and efficiency. Baseline comparisons involve existing SVM, RF, CNN, or RNN-based mouse dynamics models under identical conditions.
Reproducibility: The paper does not mention public release of code or trained weights. Datasets used are publicly available. Exact replication details (random seeds, hardware) are not specified. The KDE and ApEn methods are described mathematically allowing reproduction but practical implementation details are sparse.

Concrete Example End-to-End: The procedure to find sufficient data volume for a user begins with computing KDE density estimates over increasing mouse velocity sequence lengths (e.g., increments of 200 samples). KL divergence is computed between KDEs of consecutive data volumes. When KL divergence falls below 1×10⁻⁴ and its change stabilizes, the data volume is deemed sufficient. For example, on DFL user 19, KDE plots showed density curves converged and KL divergence dropped below threshold at ~691,000 velocity samples. Next, MAU segments are formed with lengths optimized by ApEn whose slope drops below 1×10⁻⁴ signaling diminishing information gain; for DFL user 19, optimal segment length was 110-160 samples. Finally, the LT-AMouse model is trained with these segments to distinguish user identity, achieving AUC 98.52% during evaluation including blind attack samples.

Technical innovations

A statistical estimator for sufficient mouse dynamics training data using Gaussian KDE and converging KL divergence to avoid redundant or insufficient data.
Introduction of the Mouse Authentication Unit (MAU) concept optimized via Approximate Entropy (ApEn) to balance authentication accuracy and real-time practicality by choosing segment length.
The LT-AMouse framework combining 1D-ResNet for local velocity feature extraction with a GRU module to capture long-term temporal dependencies of mouse movement sequences.
Use of mouse velocity alone as input feature to enhance privacy and reduce input dimensionality, while effectively capturing behavioral biometric patterns.

Datasets

Balabit — 10 users — daily usage mouse dynamics publicly available
DFL — 21 users — laboratory-controlled mouse dynamics publicly available

Baselines vs proposed

Previous state-of-the-art methods on DFL dataset: AUC ~95% vs LT-AMouse: 98.52%
Previous methods on Balabit dataset: AUC ~90-92% vs LT-AMouse: 94.65%
Data volume reduction on DFL: original ~6.7 million velocity points vs proposed ~0.69 million (~10x reduction)

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2504.21415.

Fig 1

Fig 1: Structure of the Paper

Fig 2

Fig 2: Amount of Individual User Mouse Behavior in the

Fig 3

Fig 3: Amount of Individual User Mouse Behavior in the DFL

Fig 4

Fig 4: Illustrationsof the different distribution density with

Fig 5

Fig 5: User Authentication Model

Fig 6

Fig 6: Comparison of KDE of Different Volume of Mouse Velocity Data for Balabit and DFL Dataset Example

Fig 7

Fig 7: Proper and Total Volume of Balabit Dataset

Fig 8

Fig 8: Proper and Total Volume of DFL Dataset

Limitations

No adversarial attack beyond blind user testing; robustness against sophisticated mimicry or poisoning attacks not evaluated.
Lack of detailed training hyperparameters (epochs, batch size) and hardware specifications potentially affecting reproducibility.
Evaluation limited to two datasets; generalization to other mouse dynamic datasets or environments not tested.
Assumes mouse velocity alone suffices, potentially missing richer behavioral signals in multi-modal data.
No ablation studies isolating contributions of ResNet vs GRU modules or comparing alternative architectures.
The noise, device variability, and temporal behavior changes over longer periods (concept drift) are not deeply investigated.

Open questions / follow-ons

How does the model perform under active adversarial attacks aiming to mimic mouse velocity patterns?
Can multi-modal mouse data (position, clicks, scrolls) combined with velocity improve accuracy or robustness?
How stable and adaptive is the model over long-term usage with behavioral drift or device changes?
Would alternative sequence models like transformers outperform the 1D-ResNet+GRU for mouse dynamics features?

Why it matters for bot defense

This work informs bot-defense engineers focusing on behavioral biometric authentication through mouse dynamics, especially when designing systems that must operate reliably under data volume constraints and real-time latency requirements. The proposed KDE-KL divergence method offers a principled approach to estimating minimal sufficient data for training personalized authentication models, optimizing resource use during model building. The MAU length optimization via Approximate Entropy helps practitioners tune segment length to balance accuracy against system responsiveness, critical for user experience in active authentication contexts. The LT-AMouse architecture combining local convolutional and recurrent temporal modeling provides a blueprint for deep feature extractors capturing subtle human interaction patterns that are difficult for bots to mimic. However, as mouse dynamics-based authentication is typically a secondary mechanism, the limitations around adversarial robustness and device/environment variability caution against sole reliance on these techniques. Integrating such behavioral models with other CAPTCHAs or fraud detection heuristics could enhance layered defenses against automated attacks and impersonation. Overall, the paper offers actionable methodology and model design insights but further adversarial validation would be needed before deployment in high-stakes authentication settings.

Cite

bibtex

@article{arxiv2504_21415,
  title={ Optimizing Mouse Dynamics for User Authentication by Machine Learning: Addressing Data Sufficiency, Accuracy-Practicality Trade-off, and Model Performance Challenges },
  author={ Yi Wang and Chengyv Wu and Yang Liao and Maowei You },
  journal={arXiv preprint arXiv:2504.21415},
  year={ 2025 },
  url={https://arxiv.org/abs/2504.21415}
}

Optimizing Mouse Dynamics for User Authentication by Machine Learning: Addressing Data Sufficiency, Accuracy-Practicality Trade-off, and Model Performance Challenges ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​