Trait, Not State — The Durability of Reading Identity in Social Highlighting

Source: arXiv:2606.12904 · Published 2026-06-11 · By Kazuki Nakayashiki, Keisuke Watanabe

TL;DR

This paper addresses a fundamental question in modeling user behavior on social web highlighters: whether a reader's document selection signature represents a stable "trait" or a transient "state." Prior work identified individuality in which documents a user highlights, but only at a single point in time. Here, the authors longitudinally track readers' selection distinctiveness over gaps extending beyond two years. By freezing each reader's first six months of highlights as their profile and comparing their subsequent selections against matched controls from the same calendar era, the study isolates personal drift from supply/content drift. The results show that the individual signature is highly durable: a profile's advantage over others does not degrade detectably up to at least 12 months, and remains largely intact through 24+ months. Additionally, this durable identity signal is not merely due to repeatedly reading the same domains; ~90% of the advantage survives removing all profile domains. Within-person drift is slow but present, with recent history slightly outperforming older history by +0.042 AP. Crucially, personal profiles built even from earliest data strongly outperform all tested non-personal baselines prospectively, including popularity and neighborhood co-reading priors, by roughly 3x average precision. Overall, the study operationalizes "trait" as a stable, individual selection fingerprint persisting under continued platform engagement, and finds this trait signal stable, robust, and actionable over years of user activity on the Glasp social web highlighter platform.

Key findings

Freezing a 6-month profile produces a fine-layer own-vs-other AP advantage of +0.188 [0.160, 0.216], closely reproducing prior cross-sectional levels (+0.169).
Paired retention at 6–12 months is R = 1.003 [0.854, 1.184], showing no statistically detectable decay in own-vs-other advantage within users over this horizon (n=212).
No statistically significant own-vs-other advantage decay is observed up to 24+ months; the farthest bin (n=65) suggests any decline is modest and unresolved within error.
Approximately 90% of the durable identity advantage remains after excluding all documents from profile domains, indicating it is not reducible to repeated domains.
Within-person drift is slow but measurable: a volume-matched recent half-profile outperforms the old half by +0.042 [+0.020, +0.064] average precision.
Prospectively, personal profiles (whole history, recent, or earliest documents) rank users' next reads at ~3x the AP of non-personal priors such as lifetime popularity or neighborhood co-reading popularity.
The neighborhood co-reading prior performs worse than random (0.192 vs ~0.227 AP), indicating individual long-tail interests are not captured by simple collaborative signals on this dataset.
Paired contrasts remove cohort composition biases, confirming that the individual signature’s stability is not an artifact of longer-tenured active users over time.

Threat model

The adversary is assumed to have access to user highlighting histories and can observe which documents users selected to highlight over time. They cannot observe impression logs or control what documents a user is exposed to, so selection mixes exposure and choice. The adversary cannot separate external personalized feeds or algorithmic feedback loops. The platform environment is browser-extension based, limiting internal algorithmic confounds but exposure outside the platform is unknown.

Methodology — deep read

The study investigates whether a reader's document selection behavior on a social web highlighter platform represents a stable personal trait or a transient state. The threat model assumes strong observational capabilities: access to user highlighting histories, but no impression logs or external recommendation signals; the adversary cannot separate exposure from choice due to platform architecture.

Data is from Glasp, a popular social web highlighting service with over a million users. The authors uniformly randomly sample 191,223 user records, qualify 405 heavy readers (≥60 clean web documents spread across ≥12 months and ≥8 active months) to ensure adequate data for longitudinal analysis. Median documents per qualifying reader is 133. The study excludes imported documents, Kindle/social/video domains, and bursty bulk imports to avoid confounds.

For each reader, a "profile" is constructed by sampling min(20, all available) documents from their first six calendar months of highlighting activity, computing a centroid embedding from their highlight-span text embeddings (512-d Text-Embedding-3-Small model). This profile is frozen and used to rank candidate documents from later time bins binned by gap between profile window end and candidate selection: [0–1), [1–3), [3–6), [6–12), [12–24), and 24+ months.

Candidate negatives are drawn time-matched from other readers' selections during the same calendar interval as positives to control for supply drift. Two regimes of negatives are used:

Coarse (easy): negatives and controls sampled globally from entire cohort.
Fine (hard, primary): negatives and controls sampled from the reader's top-25 profile-similar peers to isolate fine-grained individual identity.

Each ranking is scored by average precision (AP). Own-vs-other advantage is computed as the difference between the reader's profile AP and the mean AP of three seeded control profiles on identical candidate sets. Controls exclude documents highlighted by the target user.

Primary evaluation is paired contrasts comparing each user's later bin advantage to their own 0–1 month advantage, removing survivor bias and cohort composition confounds. Cluster bootstrap by user (3,000 iterations) provides confidence intervals.

Additional analyses test stability by excluding profile domains from candidates to test domain repetition effects, splitting history into recent and old halves to measure drift, and prospectively ranking each reader's actual next documents months after profile freeze against several non-personal priors (popularity, neighborhood co-reading).

Pre-specification of cohort criteria, bins, regimes, margins, and passing criteria were fixed internally before measurement runs, with some post-hoc additive contrasts (12- and 24-month bins). The pipeline and data remain private due to privacy, but aggregate statistics and code are available by request. Sample sizes vary by bin (n=102–360). Effect size stability is confirmed across sampling seeds.

A concrete example: A user with 20 sampled profile documents from months 0–6 is evaluated on their documents highlighted at months 12–24 compared to controls. AP scores and advantage differences are computed and bootstrapped to test if the profile retains predictive power over that distant horizon. This process repeats for all qualified users and bins.

Technical innovations

A temporal decay curve measuring individual reader selection distinctiveness up to >24 months on naturalistic, engagement-verified highlights.
A two-regime negative sampling design (coarse global vs. fine neighborhood peers) to separate broad topic identity from fine-grained personal identity.
Paired within-user retention contrasts that control for survivor and cohort-composition bias in identity decay measurement.
Held-out domain exclusion to isolate durable identity signals beyond repeated domain habits on unseen sources.

Datasets

Glasp social web highlighter highlighting logs — 191,223 user records scanned, with 405 heavy long-tenured users qualifying for analysis — proprietary platform data not publicly released

Baselines vs proposed

Lifetime popularity prior: AP = 0.229 vs personal profile whole history: AP = 0.704
Neighborhood co-reading popularity prior: AP = 0.192 vs personal profile earliest documents: AP = 0.659
Random baseline: AP = 0.227 vs personal profile recent half: AP = 0.705
Coarse global negative regime baseline advantage: +0.499 at 0-1 month vs fine neighborhood negative regime proposed advantage: +0.188 at 0-1 month

Limitations

The cohort is heavy, long-tenured users (1 in 472 records); durability among light or churned users is unmeasured and unobservable at long gaps.
Exposure and choice cannot be separated without impression logs; persistent personalized exposure loops could inflate observed stability.
The profile representation relies on simple frozen centroid embeddings from document titles and highlight-span embeddings, which may miss more nuanced drift detectable with advanced models.
Granularity of gap bins and profile window (6 months) is coarse; late horizon estimates (24+ months) have small sample size (n=65) and wide confidence intervals.
The negative sampling regime affects absolute advantage levels and interpretation, requiring anchoring on relative shapes and paired contrasts.
Findings are from a single platform (Glasp social highlighter) and a particular type of user (browser-extension heavy highlighters), limiting generalizability without replication.
Post-hoc additive contrasts (12- and 24-month paired tests) add uncertainty despite pre-specification of main 6–12 month retention readout.

Open questions / follow-ons

How does selection identity durability manifest in lighter or more casual readers with less sustained engagement?
Can stronger, more expressive embedding or sequence models reveal finer-grained drift in reading identity signatures?
How much of the durable signature reflects stable interests versus persistent, personalized content exposure loops beyond platform controls?
Would findings replicate on platforms with different user-base characteristics or recommendation architectures, including feed-ranked surfaces?

Why it matters for bot defense

For bot-defense and CAPTCHA engineers, this study highlights the long-term stability of individual user interaction signatures on content selection in a naturalistic setting. If integrated into behavioral bot-detection, it suggests that robust longitudinal profiles capturing fine-grained user content preferences remain consistent over at least months to years, providing a persistent fingerprint to distinguish genuine users from automated or ephemeral actors. The clear separation between stable individual traits and transient states argues against overemphasizing recency in behavioral heuristics.

Furthermore, the methodology, notably the paired within-user contrasts and fine neighborhood negative selection, offers a rigorous framework for measuring the durability of behavioral signals over time while controlling for cohort effects and content drift. Bot-defense systems aiming to detect sophisticated evasive bots might leverage similar embedding-based personal identity fingerprints on interactions that resist drift for extended periods, improving resilience against adaptive adversaries mimicking surface-level behavior but failing to match deep, stable user profiles.

Cite

bibtex

@article{arxiv2606_12904,
  title={ Trait, Not State: The Durability of Reading Identity in Social Highlighting },
  author={ Kazuki Nakayashiki and Keisuke Watanabe },
  journal={arXiv preprint arXiv:2606.12904},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.12904}
}

Trait, Not State: The Durability of Reading Identity in Social Highlighting ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​

Trait, Not State: The Durability of Reading Identity in Social Highlighting