Privacy-Preserving Federated Learning Framework for Risk-Based Adaptive Authentication

Source: arXiv:2508.18453 · Published 2025-08-25 · By Yaser Baseri, Abdelhakim Senhaji Hafid, Dimitrios Makrakis, Hamidreza Fereidouni

TL;DR

FL-RBA2 is a federated learning framework for risk-based adaptive authentication (RBA) that tries to solve two problems at once: privacy leakage from centralized risk engines and poor model quality under highly heterogeneous user data. The paper’s core idea is to transform diverse per-user authentication signals—behavioral, biometric, contextual, interaction-based, and knowledge-based—into per-session similarity vectors so that clients can participate in FL with more IID-like inputs. On top of that, it adds differential privacy and message authentication codes to protect model updates and message integrity.

What is new here is not just “FL for authentication,” but the similarity-transformation layer that sits in front of learning. The authors argue that this reduces Non-IID bias, supports cold-start mitigation via clustering-based risk labels, and enables a single federated pipeline that handles multiple modalities. They also provide game-based security proofs in the random oracle model. The empirical section claims strong high-risk-user detection on keystroke, mouse, and contextual datasets, plus resilience to model inversion and inference attacks under differential privacy, but the excerpt provided does not include the actual numeric results, so those cannot be reconstructed here.

Key findings

The framework maps heterogeneous authentication signals into per-session similarity vectors before federated aggregation, which is the paper’s main mechanism for handling Non-IID user data in RBA.
FL-RBA2 combines clustering-based risk labeling with lightweight local supervised models to mitigate cold-start, rather than relying on a purely supervised global classifier.
The protocol adds differential privacy noise to model updates and MAC-based integrity/authenticity checks, explicitly targeting model inversion, inference, tampering, and replay attacks.
The authors claim formal game-based security proofs in the Random Oracle Model for privacy, correctness, and adaptive security; however, the excerpt does not provide theorem statements or concrete reduction bounds.
Experiments were run on keystroke, mouse, and contextual datasets, but the excerpt does not specify dataset sizes, feature counts, or the exact metrics/values reported.
The paper claims the method remains effective for high-risk user detection even under “strong DP constraints,” but the exact privacy budgets and accuracy drop are not visible in the provided text.
A FastDTW approximation is introduced for behavioral sequence matching to make on-device similarity computation more practical than quadratic DTW.

Threat model

The adversary is primarily a passive or semi-passive server that follows the protocol but attempts to infer sensitive user information from federated updates, plus network attackers who may try tampering or replay. The client application is assumed hardened and not compromised, the communication channel is encrypted, and MACs are available to verify integrity and authenticity. The user is not trusted to behave honestly, but the paper does not present a full malicious-client defense model; instead it relies on local processing, DP, and secure messaging to reduce leakage and manipulation risk.

Methodology — deep read

Threat model and assumptions: the server is modeled as honest-but-curious, meaning it follows the protocol but may try to infer sensitive information from received updates. The end user is treated as untrusted, while the client application is assumed to be hardened against compromise or tampering. Communication channels are assumed secure via encryption, and timestamps plus MACs are used to stop tampering and replay. The security goal is not full malicious robustness against a fully compromised client; rather, the framework is designed to prevent passive leakage at the server and integrity attacks in transit.

Data and representation: the paper’s raw inputs are heterogeneous authentication features grouped into five categories: knowledge-based, biometric, behavioral, contextual, and interaction-based. The text discusses passwords/security questions, fingerprints/face/iris/voice, typing and mouse traces, location/IP/device/time, and application/browser/social activity. The key preprocessing step is to compare a live session against a registered or historical reference and convert the comparison into a normalized similarity score. For example, knowledge-based features become binary 0/1 matches, biometric vectors use cosine similarity, behavioral sequences use DTW or FastDTW with normalization by a user-specific maximum DTW distance, contextual features use set membership or normalized distances such as Haversine for location, and interaction features use Jaccard similarity. These individual scores are then assembled into a similarity vector that is intended to be more IID-like than raw user features. The excerpt does not give dataset cardinalities, train/test split percentages, or whether sessions are user-disjoint across splits; those details are therefore unclear.

Architecture and algorithm: FL-RBA2 has five stages. First, feature engineering selects and categorizes the modalities relevant to a login attempt. Second, similarity evaluation transforms each modality into a standardized score using feature-appropriate metrics. Third, similarity vector aggregation combines those scores into a unified vector, which is the actual input to learning. Fourth, each client trains a local risk assessment model on its own similarity vectors to predict authentication risk. Fifth, federated aggregation combines local updates into a global model. The paper claims this transformation reduces statistical heterogeneity because all clients now train on a common representation space, even if the underlying raw modalities differ. The novelty is not a new neural network architecture in the excerpted text; it is the representation layer and the federated pipeline around it. The paper also introduces clustering-based unsupervised risk labeling to reduce cold-start issues: when labeled risk data are scarce, clusters are used to derive risk labels that feed a supervised local model.

Training and optimization: the excerpt does not give the optimizer, learning rate, batch size, number of local epochs, number of communication rounds, client fraction, or any seed strategy. It also does not specify whether the local model is a classical classifier, a small MLP, or another architecture. The only implementation-level detail visible is the use of FastDTW as a linear-time approximation to standard DTW because vanilla DTW is O(N^2) in time and space and is too expensive for real-time or edge deployment. Differential privacy is added to model updates before transmission, but the exact mechanism (e.g., Gaussian vs. Laplace noise, clipping norm, epsilon/delta values) is not included in the provided text.

Evaluation protocol: the paper states that it evaluates on keystroke, mouse, and contextual datasets, and claims the framework is effective for high-risk user detection and resilient to model inversion/inference attacks even under strong DP. However, the excerpt does not show the exact metrics, class balance, attacker models, privacy budgets, or baseline list used in the empirical section. In the theory section, the authors also present cryptographic reasoning: MACs protect integrity/authenticity, while DP is invoked to limit leakage from aggregated updates. An end-to-end example from the text would be a login attempt where a user’s current typing pattern, device, location, and browser context are compared to historical references; each modality yields a normalized similarity score, those scores form a vector, the local model maps that vector to a risk level, and the server then requests stronger or weaker authentication based on that risk. The local update is noise-perturbed for DP, MAC-tagged, and then aggregated into the global model. Reproducibility is limited in the provided excerpt: no code release, frozen weights, or public dataset statement is visible, and the paper appears to rely on a mix of real-world and possibly curated datasets without enough detail in the excerpt to verify exact replication steps.

Technical innovations

Similarity-vector transformation that converts heterogeneous authentication modalities into a common IID-like representation for federated training.
Clustering-based risk labeling to mitigate cold-start when labeled authentication risk data are sparse.
Combination of differential privacy and MAC-based message authentication in the FL-RBA2 protocol.
Use of FastDTW to make behavioral similarity computation more practical for on-device or real-time authentication.

Datasets

keystroke dataset — size not specified in excerpt — source not specified in excerpt
mouse dataset — size not specified in excerpt — source not specified in excerpt
contextual dataset — size not specified in excerpt — source not specified in excerpt

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2508.18453.

Fig 1

Fig 1: Risk Based Adaptive Authentication Workflow

Fig 2

Fig 2 (page 1).

Fig 3

Fig 3 (page 1).

Fig 4

Fig 4 (page 1).

Fig 2

Fig 2: FL-RBA2 System and Security Model

Fig 3

Fig 3: FL-RBA2 Methodology Process Flow

Fig 5

Fig 5: Federated Learning with DP in FL-RBA2

Limitations

The excerpt does not report the actual experimental numbers, so claims about performance gains, privacy budgets, and attack resilience cannot be independently checked from the provided text.
Dataset provenance, sizes, feature counts, and train/validation/test splits are not visible in the excerpt, which makes reproducibility hard to assess.
The threat model is relatively narrow: the server is honest-but-curious and the client is trusted/hardened, so the framework does not clearly address a fully malicious client or compromised endpoint.
The similarity transformation may reduce heterogeneity, but the paper does not show in the excerpt whether it preserves enough signal across all modalities or whether some modalities become weakly informative after normalization.
The security proof is described at a high level, but the excerpt does not include theorem statements, assumptions on MAC/DP parameters, or concrete reduction losses.
No evidence is shown in the excerpt for distribution-shift testing across devices, geographies, seasons, or adversarially adapted spoofing behavior beyond the stated inversion/inference resilience.

Open questions / follow-ons

How much accuracy is lost, per modality, when raw features are reduced to similarity vectors, and which modalities benefit most from this transformation?
Does the clustering-based cold-start labeling generalize across new organizations, devices, and attacker populations, or is it dataset-specific?
What privacy-utility trade-off is obtained for concrete epsilon/delta settings, and how does that compare to alternative FL privacy mechanisms such as secure aggregation plus clipping?
Can the framework survive adaptive attackers who deliberately shift behavior to mimic a user’s similarity vector distribution over time?

Why it matters for bot defense

For bot defense and CAPTCHA practitioners, the main takeaway is that authentication risk scoring can be moved closer to the client while keeping raw behavior/context local, which is attractive when privacy constraints block central collection. The similarity-vector idea is especially relevant if your system already compares live behavior to a user baseline: rather than sharing raw keystrokes, pointer traces, device fingerprints, or context records, you could federate over normalized comparisons. That could reduce exposure while still allowing a shared risk model.

At the same time, the paper’s design also highlights a practical caveat: once you compress diverse signals into similarity scores, the quality of the downstream model depends heavily on how those scores are defined and normalized. A bot-defense team would want to test whether the same transformation hides useful attacker evidence, whether it remains stable under spoofing, and whether it degrades under distribution shift across browsers, devices, or accessibility tooling. The paper is most relevant as an architectural pattern: local feature comparison, privacy-preserving update sharing, and federated risk learning, rather than as a drop-in authentication classifier.

Cite

bibtex

@article{arxiv2508_18453,
  title={ Privacy-Preserving Federated Learning Framework for Risk-Based Adaptive Authentication },
  author={ Yaser Baseri and Abdelhakim Senhaji Hafid and Dimitrios Makrakis and Hamidreza Fereidouni },
  journal={arXiv preprint arXiv:2508.18453},
  year={ 2025 },
  url={https://arxiv.org/abs/2508.18453}
}

Privacy-Preserving Federated Learning Framework for Risk-Based Adaptive Authentication ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​