Active Authentication via Korean Keystrokes Under Varying LLM Assistance and Cognitive Contexts

Source: arXiv:2509.24807 · Published 2025-09-29 · By Dong Hyun Roh, Rajesh Kumar

TL;DR

This paper investigates the robustness of keystroke dynamics as a behavioral biometric for active user authentication under the emerging conditions of Large Language Model (LLM) assistance and varying cognitive loads, focusing on Korean language typing. Unlike prior works mostly limited to English or simple fixed-text inputs, this study uses a recently collected Korean keystroke dataset of 50 users performing three realistic writing scenarios: bona fide free composition, paraphrasing LLM-generated responses, and transcribing LLM outputs. Cognitive load is modeled via Bloom’s Taxonomy, enabling detailed analysis of typing under different mental demands. The authors develop a continuity-aware keystroke segmentation and feature extraction pipeline, applying feature selection (MRMR+MI) and classification with SVM, MLP, and XGB models. They find that reliable authentication is feasible even under LLM-assisted and cognitively varying contexts, with Equal Error Rates (EERs) ranging roughly from 5.1% to 10.4%. XGB and SVM outperform MLP, and tests across scenario-aware and cognition-aware configurations highlight the value of context-resilient training strategies. The work bridges gaps in previous keystroke dynamics research by incorporating modern writing behaviors mediated by LLMs and cognitive modeling, extending investigation to Korean free-text scenarios and demonstrating practical system performance while offering insights for future design of behavioral biometrics resilient to varied user states and AI assistance.

Key findings

XGB classifier achieves lowest average EER of 5.5% in scenario-unaware, cognition-unaware settings.
SVM closely follows with mean EER of 5.8% under similar settings, both outperforming MLP (EER > 6.7%).
Transcribed LLM-response typing scenario yields the lowest EERs (e.g., SVM-HH: 5.15%) compared to paraphrased and bona fide.
Higher cognitive load conditions (HH) produce lower EERs (~5.6% XGB) than low load (LL) or mismatched conditions (HL, LH).
Scenario-unaware training, which pools typing scenarios, consistently outperforms scenario-aware training by ~1% EER.
Using continuity-aware Key Interval Time (KIT) feature extraction avoids inflated timing artifacts from non-sequential typing.
Window size of 200 keystrokes with 150 overlap provides optimal balanced accuracy across classifiers.
MRMR+Mutual Information feature selection improves authentication generalization versus univariate feature ranking.

Threat model

The adversary is an impersonator who attempts to mimic the legitimate user's typing patterns to bypass continuous authentication. They have access to keystroke timing data collected during typing but do not know the user's cognitive state or which scenario (bona fide, paraphrased, transcribed) the user is in. They cannot physically possess the legitimate user's device or credentials. The system must verify the genuine user continuously and detect impostors via behavioral patterns, considering that natural typing varies due to cognitive load and AI assistance.

Methodology — deep read

The threat model assumes an active attacker who may attempt to impersonate the legitimate user via typing, with access to keystroke timing data but no direct knowledge of the user's cognitive state or scenario. The system cannot rely on fixed passwords and requires continuous authentication during use.

The study uses a publicly available Korean keystroke dataset from Roh et al. consisting of 69 total users; 50 users with data from two phases were selected for training (Phase 1) and testing (Phase 2) to avoid data leakage. The dataset captures raw keydown and keyup timestamps under three distinct typing scenarios: bona fide free composition, paraphrasing of ChatGPT-generated responses, and transcription of ChatGPT outputs. Each scenario includes responses to six questions categorized by cognitive load based on Bloom’s Taxonomy levels (remember, understand, apply, analyze, evaluate, create). This labeling allows analysis of cognitive context impact on typing.

In preprocessing, raw keystroke events are segmented by question index and temporal continuity flags identify interruptions due to users revisiting previous questions, avoiding inflated inter-key timing. Overlapping sliding windows across keystroke sequences (window sizes 50-400 keys and overlaps 25-75%) extract Key Hold Time (KHT) and continuity-aware Key Interval Time (KIT) features per window. KIT intervals crossing discontinuities are excluded.

A Minimum Redundancy Maximum Relevance (MRMR) approach with mutual information scoring selects informative, non-redundant features, improving generalization over univariate methods. Three classifiers are evaluated: Support Vector Machine (SVM) with RBF kernel, Multilayer Perceptron (MLP) with two hidden layers, and Extreme Gradient Boosting (XGB), chosen for prior success in keystroke biometrics and suitability to tabular data.

Models train on Phase 1 data and test on Phase 2, simulating real deployment. Stratified 5-fold cross-validation tunes hyperparameters and evaluates balanced accuracy and Equal Error Rate (EER). Four experimental configurations explore scenario-awareness and cognition-awareness independently or combined. Multiple cognitive training-testing matching schemes are tested to understand cognitive load mismatch effects. DET (Detection Error Tradeoff) curves and violin plots visualize user-level EER distributions and classifier tradeoffs.

Overall, the methodology rigorously incorporates realistic behavioral variation due to AI assistance and cognitive demand, with careful data partitioning to avoid leakage, continuity-aware feature engineering, and extensive model evaluation across user and contextual strata. The code and dataset are publicly released for reproducibility. A concrete example includes segmenting continuous typing on paraphrased text with identified temporal discontinuities excised from KIT computation, feature extraction via sliding windows, MRMR feature selection, and classification by SVM to produce user verification decisions with ~5-7% EER.

Technical innovations

Continuity-aware segmentation algorithm that excludes temporal discontinuities when computing Key Interval Time (KIT) features to mitigate inflated timing artifacts from users revisiting questions.
Application of Bloom’s Taxonomy-based cognitive load labeling to keystroke authentication, enabling context-aware modeling and evaluation of cognitive state impacts.
First systematic evaluation of Korean keystroke dynamics authentication under Large Language Model (LLM) assistance in paraphrasing and transcription scenarios.
Combined scenario-aware and cognition-aware modeling frameworks to analyze and improve authentication robustness across writing context and cognitive state variations.

Datasets

Korean Keystrokes Dataset — 69 users total, 50 users subset used — public via GitHub https://github.com/rajeshjnu2006/Korean-keystrokes-auth-icmla2025

Baselines vs proposed

SVM scenario-unaware cognition-unaware EER = 5.8% vs proposed (XGB) = 5.5%
MLP scenario-unaware cognition-unaware EER = 6.7% vs proposed XGB = 5.5%
SVM scenario-aware Bona fide EER = 7.50% vs XGB = 6.26%
SVM scenario-aware Transcribed EER = 5.15% vs MLP = 5.78%
Scenario-aware training EERs ~1% higher than scenario-unaware for all classifiers

Limitations

Limited user sample size (50 users) restricts generalizability to broader populations or domains.
No explicit adversarial mimicry or spoofing attacks tested; robustness against active impostors unknown.
Dataset restricted to Korean language; findings may not generalize to languages with different typing characteristics or LLM integration.
No evaluation under distributional shifts such as device changes, keyboard layouts, or longer-term temporal drift.
Deep learning models beyond MLP not explored, which may capture richer temporal patterns with larger datasets.
Cognitive load labeling based on Bloom’s Taxonomy is indirect and inferred from question difficulty, not real-time mental state measurement.

Open questions / follow-ons

How robust is keystroke-based authentication against deliberate adversarial imitation attacks, especially under LLM-assisted conditions?
Can more sophisticated deep learning approaches improve performance or robustness in cognitively and contextually diverse typing?
How does typing behavior and authentication performance evolve over longer time intervals or device/environment changes?
What is the impact of multilingual or code-switching typing patterns combined with LLM assistance on keystroke biometrics?

Why it matters for bot defense

For bot-defense and continuous authentication practitioners, this paper provides important insights on how modern LLM usage modifies keystroke dynamics and the necessity of incorporating scenario- and cognition-aware modeling to maintain verification reliability. The continuity-aware segmentation technique helps filter out noisy timing artifacts, improving feature quality for behavioral classifiers. Findings caution that authentication models trained solely on standard free-text inputs may degrade when users incorporate AI-generated content via paraphrasing or transcription, making adaptive modeling crucial.

The use of cognitive load modeling highlights that mental state variation significantly affects keystroke patterns, suggesting behavioral security systems must consider cognitive context or train on mixed cognitive data for robust deployment. Although the study focuses on Korean, the principles apply broadly where AI-assisted writing is prevalent. The reported EER ranges (5-10%) demonstrate active authentication feasibility but also indicate room for improvement under realistic, heterogeneous usage scenarios. Practitioners designing CAPTCHAs or continuous verification mechanisms can leverage these results to anticipate shifts in behavioral biometrics as AI-mediated content creation grows and to build more context-resilient defense systems.

Cite

bibtex

@article{arxiv2509_24807,
  title={ Active Authentication via Korean Keystrokes Under Varying LLM Assistance and Cognitive Contexts },
  author={ Dong Hyun Roh and Rajesh Kumar },
  journal={arXiv preprint arXiv:2509.24807},
  year={ 2025 },
  url={https://arxiv.org/abs/2509.24807}
}

Active Authentication via Korean Keystrokes Under Varying LLM Assistance and Cognitive Contexts ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​