Designing Conversations with the Dead: How People Engage with Generative Ghosts

Source: arXiv:2605.21390 · Published 2026-05-20 · By Jack Manning, Daniel Sullivan, Dylan Thomas Doyle, Anthony T. Pinter, Jed R. Brubaker

TL;DR

This study investigates user experiences with generative ghosts—AI chatbots trained on data about deceased individuals—focusing on how design choices about narrative point of view affect feelings of authenticity, emotional connection, and perceived risk. The researchers compare two modes: 'reincarnation,' where the AI speaks in the first person as if it were the deceased, and 'representation,' where it speaks about the deceased in the third person. Through a qualitative within-subject user study involving 16 bereaved participants interacting with both modes via a Wizard-of-Oz setup using GPT-4, the authors analyze conversational interactions and reflective interview responses. They find that reincarnation generates a stronger sense of immediacy and emotional closeness due to its direct first-person voice, but participants also express concerns about over-dependence. Representation allows for reflective memory engagement with emotional distance and is sometimes perceived as more authentic to the deceased's conversational style. Across modes, participants prioritize affective resonance—tone, rhythm, and linguistic familiarity—over factual accuracy. The study concludes that interactions with generative ghosts are inherently collaborative and shaped by the user's unique memories and emotional needs, suggesting opportunities for rich personalization beyond narrative point of view.

Key findings

Reincarnation mode (first-person) created a more immediate and emotionally intimate experience, with participants like P3 reporting feeling as if talking directly with their deceased loved one.
Representation mode (third-person) fostered reflective distance and invoked memories through descriptive narration, supporting a different form of emotional connection.
Participants often ignored the third-person framing in representation and engaged in dialogic interaction despite the narrative voice difference.
Participants prioritized affective resonance (tone, language, rhythm) over factual accuracy; authenticity hinged on whether language felt right rather than literal truth.
Misplaced word choices or phrase nuances (e.g., inappropriate greetings or culturally inaccurate terms) disrupted emotional immersion.
Both modes supported complementary emotional regulation: reincarnation for closeness and presence, representation for control and reflection.
Slow conversational pacing (60-90s response latency) due to manual filtering was a limitation but helped maintain relevancy and emotional appropriateness.
No participant reported preference exclusively based on narrative viewpoint; the interpersonal relationship and emotional needs guided engagement.

Methodology — deep read

The researchers employed a qualitative within-subjects design to investigate how bereaved individuals experience generative ghosts simulated in two narrative modes: reincarnation (AI speaking as the deceased in first-person) and representation (AI explaining the deceased in third-person).

Threat Model & Assumptions: The adversary model is not applicable since this is a user experience study. The underlying assumption is that participants have ready access to personal memories of the deceased and can perceive subtle linguistic cues.

Data: Sixteen participants (ages 22-50), who had all lost a close loved one (relatives or friends), were recruited via social media and snowball sampling. Each participant chose one deceased individual on whom the AI would be based. Intake surveys collected demographic data and detailed descriptions of the deceased's personality, communication style, and key traits.

Architecture / Algorithm: The system used a Wizard-of-Oz approach, where a human operator mediated chat conversations by generating AI responses via GPT-4 with prompt engineering to enforce either first-person or third-person narrative styles. The prompts seeded GPT-4 with participant-specific background data collected during intake. The operator lightly edited and filtered responses for relevancy and emotional appropriateness. This approach prioritized user safety and narrative coherence but limited conversational speed.

Training Regime: No model training occurred; the AI was used off-the-shelf with prompt priming and human-in-the-loop editing. The operator ensured adherence to the intended narrative point of view.

Evaluation Protocol: Each participant engaged in two 20-minute chat sessions, one per mode, with order randomized. Conversations occurred over Zoom chat. Afterward, participants completed semi-structured interviews reflecting on the experience and comparing modes. Data consisted of chat logs and interview transcripts. Iterative thematic analysis was performed using Braun and Clarke's approach combined with memoing and axial coding by two researchers. Saturation was achieved at 16 participants.

Reproducibility: The generative content was human-mediated and bespoke for each participant based on personal histories, making exact replication difficult. However, the study protocol, prompts, and interview guides were provided. The use of commercial GPT-4 and the WoZ setup is fully described but no code or dataset appears publicly released.

Technical innovations

Use of an AI-assisted Wizard-of-Oz system combining GPT-4 with human filtering to simulate nuanced generative ghosts with distinct narrative modes.
Empirical comparison of narrative point of view (first-person reincarnation vs third-person representation) in generative systems based on deceased individuals.
Qualitative analysis linking linguistic features such as tone, rhythm, and phraseology with perceived authenticity beyond factual accuracy.
Identification of emotional resonance rather than factual fidelity as the central metric governing user acceptance and connection.

Datasets

Participant-generated personal descriptions of deceased loved ones — 16 participants — private and non-public.

Baselines vs proposed

Representation mode: emotional closeness varied across participants; preferred for reflective distance by some vs Reincarnation mode: provided stronger feelings of immediate presence and comfort but raised concerns of over-reliance.
No quantitative metrics reported; findings are qualitative preferences and thematic code analysis.

Limitations

Small sample size (N=16) limits generalizability.
Wizard-of-Oz setup prevents testing fully autonomous AI-generated conversations and real-world scalability.
Slow response time (60-90 seconds) might have impacted conversational naturalness.
Study participants were primarily from Western, Anglo-American cultural contexts; results may not generalize globally.
Emotional responses might be influenced by researcher positionality and participant self-selection bias towards openness to AI interactions.
No adversarial testing or longitudinal evaluation of psychological impact or effect on grief processes.

Open questions / follow-ons

How would fully autonomous generative ghost systems affect user experience compared to the mediated Wizard-of-Oz approach?
Can dynamic switching between representation and reincarnation modes improve emotional regulation during grief?
What role can multimodal cues (voice, embodiment, facial expression) play in enhancing affective resonance and authenticity?
How do cultural differences influence preferences for narrative point of view and acceptance of generative ghosts?

Why it matters for bot defense

While not directly about CAPTCHA or bot defense, this research offers valuable insights into human-AI interaction where emotional authenticity and perceived agency are critical. For bot-defense and CAPTCHA practitioners, understanding how users interpret and emotionally respond to different AI conversational personas can inform the design of conversational AI challenges that must appear authentic but not overly intrusive or uncanny. The findings that affective resonance often trumps factual accuracy could guide the tuning of chatbot responses used for validation purposes to balance human-like behavior with risk of over-identification or manipulation. Additionally, the mixed preference for narrative POV suggests that interface framing (first vs third person) can materially alter perceived trustworthiness and risk, a consideration relevant when designing AI agents to detect or deter bots without disconcerting genuine users. Lastly, the importance of conversational rhythm, tone, and linguistic style support incorporating behavioral cues beyond text correctness into bot detection heuristics.

Cite

bibtex

@article{arxiv2605_21390,
  title={ Designing Conversations with the Dead: How People Engage with Generative Ghosts },
  author={ Jack Manning and Daniel Sullivan and Dylan Thomas Doyle and Anthony T. Pinter and Jed R. Brubaker },
  journal={arXiv preprint arXiv:2605.21390},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.21390}
}

Designing Conversations with the Dead: How People Engage with Generative Ghosts ​

TL;DR ​

Key findings ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​