From Prompts to Preferences: An Open-Source Platform for Generative AI-Enhanced Conjoint Analysis

Source: arXiv:2606.12972 · Published 2026-06-11 · By Philipp Brauner

TL;DR

This paper addresses a significant accessibility gap in conjoint analysis, a widely used method for measuring preferences across multiple disciplines including marketing, political science, healthcare, and human-computer interaction. While commercial tools for conjoint design and analysis are prevalent, they are often expensive and proprietary, limiting access for many researchers. Additionally, conventional conjoint surveys rely on tabular stimuli presentations which may inadequately represent complex, holistic alternatives. To address these gaps, the author presents GenerativeConjoint, an open-source, self-hosted web platform that supports the full survey lifecycle—from design and generative AI-enhanced stimulus creation (textual and visual) to deployment and analysis with export capabilities fostering reproducibility.

GenerativeConjoint integrates large language models and text-to-image generators to produce scenario-based and visual stimuli parameterized by conjoint profiles and enriched with researcher-defined prompts and annotations. The platform features an intuitive setup wizard, AI-assisted attribute suggestion, and live preference analysis, lowering barriers for researchers new to conjoint methodology. A proof-of-concept study on robot care designs using AI-generated ink-drawing style images (N=55 participants, 8 tasks each) demonstrated the practical utility of the platform; results show meaningful preference structures with size and locomotion emerging as key attributes. Users rated the system highly for usability and stimulus clarity, while the generated stimuli and full prompt archives enable transparency.

The paper emphasizes that generative AI is a powerful tool for stimulus generation when grounded by rigorous theoretical design decisions by researchers. AI can reduce the cost and effort of producing rich, integrated stimuli that better capture participant engagement than traditional tabular formats, opening new methodological avenues in HCI and other applied domains.

Key findings

GenerativeConjoint supports 1–6 attributes with 2–4 levels each and uses a coordinate exchange algorithm to generate near D-efficient conjoint designs (achieving 0.61 D-efficiency in the validation study).
The platform provides three stimuli presentation modes: conventional tabular, LLM-generated textual scenario descriptions, and text-to-image generated visual stimuli.
Proof-of-concept conjoint study (N=55, UK participants) on ambient assisted living care robot preferences used AI-generated ink drawing images from 'gpt-image-2' as stimuli.
Robot size was the most important attribute at 41% relative importance, with medium size rated highest (+0.83 utility), and locomotion second at 28.5%.
Participants rated system usability highly (mean 4.89/5), visual quality of AI-generated images (mean 4.64/5), and clarity of alternatives (mean 4.75/5).
The platform automatically exports full datasets including response data, design matrices, all stimuli and their generating prompts, supporting reproducibility.
Stimuli are generated once per profile and shared across participants, meaning any idiosyncratic errors affect all participants equally, flagging a systematic measurement error.
AI-assisted attribute and level suggestions can be generated but require critical researcher review to maintain theoretical soundness.

Threat model

The platform assumes a non-adversarial setting where researchers control survey design and participant assignment. There is no provision for adversaries altering stimuli generation prompts, manipulating participant responses, or conducting attacks on AI components. The threat model is limited to ensuring methodological rigor and reproducibility rather than security against malicious actors.

Methodology — deep read

Threat Model & Assumptions: The platform assumes a non-adversarial research environment focused on valid preference elicitation. Adversarial manipulation of stimuli or responses is not addressed. It assumes researchers own the survey design and controls participant assignment, with no attacker able to alter prompts or stimuli generation once fixed.
Data: The platform supports survey design with 1–6 attributes and 2–4 levels each. The proof-of-concept validation study recruited 55 UK participants via Prolific who completed 8 forced-choice tasks, selecting preferred robot profiles out of 2 alternatives per task. Participants completed 440 choice tasks total. Participant demographics included ages 19–79 (median 44), balanced gender.
Architecture/Algorithm: Experimental designs use a coordinate exchange algorithm optimizing the determinant of the design information matrix (D-efficiency) over effects-coded attribute levels to produce statistically near-orthogonal and balanced fractional factorial designs. Stimulus generation leverages generative AI via OpenAI APIs: GPT-4o mini for textual scenario descriptions and gpt-image-2 for image stimuli. Base prompts are parameterized with conjoint profiles and LLM-facing annotations (hints). The web platform uses Flask backend with SQLite DB and Bootstrap frontend.
Training Regime: Not applicable—AI models accessed externally via API. Platform runs on standard research infrastructure, storing designs and data internally.
Evaluation Protocol: The platform was evaluated through the robot care conjoint study, measuring attribute importance and part-worth utilities via conditional logit modeling of choice data. Usability and stimuli quality were assessed via Likert scales from 7 questionnaire items. The platform includes live analytics and summary statistics. Stimuli quality was manually reviewed with options to regenerate stimuli.
Reproducibility: The platform outputs a full export bundle containing the experimental design matrix, participant responses, all prompts and generated stimuli (text and images), plus a starter analysis script. All software and data from the validation study are open-source and archived at https://osf.io/6cqhx/, ensuring end-to-end reproducibility. However, access depends on OpenAI APIs and underlying models.

Concrete end-to-end example: A researcher defines 4 robot attributes (design style, locomotion, size, location) with 3 levels each. The coordinate exchange generates a subset of tasks. In visual mode, a base prompt describing an ink-drawing style robot scene is append-parameterized with each profile’s attribute levels plus LLM-facing hints. The text-to-image API is called per profile to generate the stimulus images, which are stored. Participants are recruited and complete 8 forced-choice tasks selecting between 2 stimuli images per task. Responses are stored and analyzed with conditional logit modeling, revealing size as the dominant preference factor. The full dataset, design, stimuli, prompts, and analysis scripts are archived for transparency.

Technical innovations

Integration of generative large language models and text-to-image models into an interactive conjoint survey platform for automatic, parametric stimulus generation.
Use of researcher-defined base prompts with structured conjoint profiles and LLM-facing level annotations to flexibly generate coherent textual and visual stimuli tailored to each alternative.
Open-source, self-hosted platform supporting full conjoint workflow from design through survey deployment, data collection, live preference analysis, and export of reproducible research bundles.
AI-assisted tools for attribute suggestion and prompt optimization embedded within the survey configuration to lower technical barriers for non-expert users.

Datasets

Ambient Assisted Living Care Robot Preferences — 55 participants, 440 forced-choice tasks — collected via Prolific and hosted within GenerativeConjoint (openly archived at https://osf.io/6cqhx/)

Baselines vs proposed

Not applicable - no comparisons to other conjoint software platforms or stimulus generation approaches were quantitatively reported.

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2606.12972.

Fig 1

Fig 1: Screenshots of the GenerativeConjoint survey conﬁguration wizard.

Fig 2

Fig 2: AI-generated stimuli (with ‘gpt-image-2’) from the robots for ambient assisted living study. Each image

Fig 3

Fig 3: Attribute importance (top) and part-worth utilities (bottom) for the robots for ambient assisted living conjoint

Fig 4

Fig 4 (page 7).

Fig 5

Fig 5 (page 7).

Fig 6

Fig 6 (page 7).

Limitations

Stimuli are generated once per profile and presented identically to all participants, creating potential systematic measurement error from idiosyncratic stimuli artefacts.
AI-generated attribute and level suggestions lack domain-specific theoretical grounding and should be critically vetted by researchers to avoid invalid designs.
The platform currently supports only text and static image stimuli; richer formats like video or audio are not yet integrated.
The presented proof-of-concept study is small scale (N=55), limiting statistical power and generalizability.
No adversarial evaluation or robustness testing against manipulations of AI generation or participant response behavior was performed.
Dependence on proprietary external AI APIs (OpenAI) means long-term sustainability and reproducibility depend on those services.

Open questions / follow-ons

How might variability and randomness in AI-generated stimuli across participants be incorporated to model and reduce systematic measurement error?
Can generative AI be extended to produce richer multimodal stimuli (e.g., audio, video) applicable to conjoint analysis and preserve participant engagement?
What methods can ensure theoretical validity of AI-suggested attributes and levels in domains with complex latent constructs?
How can the platform be adapted to detect or mitigate potential biases or artefacts introduced by generative AI in stimulus presentation formats?

Why it matters for bot defense

Bot-defense practitioners and CAPTCHA system designers may find this work relevant as an example of leveraging generative AI to produce rich stimuli reflecting complex attribute combinations in user interaction contexts. Although focused on preference elicitation rather than adversarial detection, the methodology for coupling controlled profile descriptions with AI-generated textual or visual stimuli provides a blueprint for dynamically crafting challenge content that is both structured and varied.

The platform’s emphasis on reproducibility and archivable prompting is instructive when deploying AI-generated challenges where transparency and traceability are desired. However, the paper does not address security concerns such as robustness to automated solver bots or adversarial manipulation of generative prompts that are common in bot defense. Its contribution lies more in detailing a practical architecture and workflow for integrating generative AI in systematic survey and experimental design, which could inspire similar rigour in CAPTCHA stimulus generation pipelines.

Cite

bibtex

@article{arxiv2606_12972,
  title={ From Prompts to Preferences: An Open-Source Platform for Generative AI-Enhanced Conjoint Analysis },
  author={ Philipp Brauner },
  journal={arXiv preprint arXiv:2606.12972},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.12972}
}

From Prompts to Preferences: An Open-Source Platform for Generative AI-Enhanced Conjoint Analysis ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​