LLM-Mediated Demand Response Coordination in Smart Microgrids
Source: arXiv:2606.11050 · Published 2026-06-09 · By J. de Curtò, I. de Zarzà
TL;DR
This paper addresses the challenge of voluntary demand response coordination in smart microgrids, modeled as a repeated Prisoner's Dilemma on a social network of prosumers. The key innovation is a hybrid multi-agent simulation framework where prosumer agents combine game-theoretic strategic reasoning with narrative evaluation from a Large Language Model (LLM) Influence Compiler. Unlike prior LLM approaches that show unrealistic universal cooperation due to RLHF alignment bias, this hybrid design separates strategic base cooperation probability from LLM-driven influence shifts, enabling nuanced agent heterogeneity and realistic cooperation dynamics. The compiled structured demand-response directives lead to a consistent cooperation increase—33.3% versus 27% unstructured messaging and 28% baseline—a +0.063 delta preserved across different agent realism levels, resistance parameters, and network topologies. Hub-targeted dissemination exploiting scale-free network structure further amplifies coordination gains independent of message content. These findings suggest complementary roles for structured LLM message compilation, grounded game-theoretic modeling, and topology-aware influence targeting in scalable, interpretable smart grid demand-response strategies.
Key findings
- Compiled structured directives achieve 33.3% final cooperation, vs 27.0% for unstructured messaging and 28.0% no-intervention, yielding a +0.063 coordination advantage (Fig 1).
- Idealized logistic agent substrate shows higher cooperation reaching 85.0% under compilation vs 76.7% unstructured, a +0.083 advantage, confirming robustness across agent realism (Table 1, Fig 2).
- Hub and bridge targeting yield final cooperation rates of 33.3%, outperforming periphery (30.7%) and random targeting (27.0%) under compiled influence (Table 2, Fig 3).
- Cooperation rates remain stable across prosumer resistance levels R = 0.1 to 0.7 with compilation advantage ranging +0.057 to +0.070, indicating graceful degradation under resistance (Table 3, Fig 4).
- Six personality archetypes produce differentiated cooperation propensities: idealists highest at 44.7%, conformists 38.3%, opportunists lowest at 22.4%, validating heterogeneity (Fig 5 left).
- Persuasion success strongly inversely correlates with agent resistance; low-resistance idealists are persuaded 85-100% of times, skeptics and opportunists <5%, showing effective resistance attenuation (Fig 5 center).
- High-degree hub agents tend to have higher final cooperation, consistent with conformist imitation dynamics reinforcing social learning (Fig 5 right).
- Unstructured free-form messaging performs worse than no influence, indicating poorly designed narratives can backfire (Experiment 1).
Threat model
The adversary is the self-interested prosumer agent deciding whether to cooperate (curtail demand) or defect. Agents know their own payoff history, local neighbor cooperation, and receive limited coordination signals via the Influence Compiler's directives. Agents resist influence according to personality-specific fixed resistance parameters but cannot forcibly defect prosumers or manipulate the network structure. The model assumes no external coercion or adversarial attackers disrupting communications or altering signals.
Methodology — deep read
Threat Model & Assumptions: The adversary is implicit — prosumers act strategically with self-interest, deciding to cooperate (curtail demand) or defect (consume as normal). No coercion or enforcement is possible, only voluntary, signal-based coordination. Agents have heterogeneous personalities and resistance to messaging.
Data: The study simulates a population of N=30 prosumer agents distributed on a scale-free network generated via Barabási-Albert preferential attachment with m=3. Agents are assigned one of six archetypes randomly. Simulations run for T=50 time steps, with influence signals broadcast every 5 steps to 20% of agents.
Architecture/Algorithm: Agents use a hybrid decision architecture. The base cooperation probability p_base is computed combining game-theoretic factors like archetype bias, exploitation history, neighbors' cooperation, payoff differences (defect vs cooperate), and temporal decay of temptation. This base probability is processed by a logistic function.
Separately, an LLM Influence Compiler generates structured natural language demand-curtailment directives targeted at select agents. When targeted, agents receive an LLM narrative evaluation delta (δLLM) in [-0.30, +0.30] indicating shift in willingness to cooperate based on message content and agent memory/persona. This delta is attenuated by agent resistance (r_o in [0,1]) to yield an effective shift. The final cooperation probability is clipped and sampled to yield binary cooperation/defection each step.
Training Regime: Not applicable since this is a simulation-based study with pre-specified model weights and no ML training.
Evaluation Protocol: Primary metrics are final average cooperation rate (averaged last 10 rounds), persuasion rate (fraction of positive influence shifts), and backlash rate (negative shifts). Four experiments compare: compiled vs unstructured vs no influence; grounded vs idealized agent substrates; targeting strategies (hub, bridge, periphery, random); and resistance parameter sweep. Results are averaged over multiple runs with stochasticity accounted for.
Reproducibility: The authors provide a GitHub repository with all code, datasets (simulated), and scripts for experiment reproduction, including caching for LLM calls. The LLM used is Llama-3.3-70B-Instruct accessed via API with seed-controlled prompts.
Concrete Example Walkthrough: At timestep t, the Influence Compiler observes a noisy population state snapshot and issues a structured directive to 6 targeted agents chosen by a targeting condition (e.g., hubs). Each targeted agent receives the directive as text plus their personal payoff/memory context. The LLM evaluates this and produces a δLLM shift. This shift modifies the base cooperative probability computed from payoffs and social learning. The probability is clipped between 0.02 and 0.98 and sampled to yield binary action. Non-targeted agents receive weaker signals via word-of-mouth. Over multiple rounds, cooperation evolves under this hybrid influence model, showing realistic personality-dependent heterogeneity and network effects.
Technical innovations
- Hybrid decision architecture combining evolutionary game-theoretic base probabilities with bounded LLM narrative evaluation shifts to overcome RLHF-induced universal cooperation bias in LLM-driven agents.
- Use of a structured Influence Compiler that generates policy-constrained natural language demand-response directives via a Solver–Critic pipeline ensuring policy alignment, contrasting free-form narrative baselines.
- Explicit modeling of heterogeneous prosumer personality archetypes with individualized resistance parameters affecting narrative influence attenuation.
- Network-aware targeting strategies (hub, bridge, periphery, random) demonstrating mechanistic amplification from scale-free topology independent of message content.
Datasets
- Simulated prosumer population — 30 agents — generated scale-free network with Barabási-Albert preferential attachment (m=3)
Baselines vs proposed
- No intervention baseline: cooperation rate = 0.280 final vs compiled: 0.333 final (+0.053 lift)
- Unstructured messaging baseline: cooperation rate = 0.270 final vs compiled: 0.333 final (+0.063 lift)
- Idealized agents unstructured: final cooperation 0.767 vs idealized compiled: 0.850 (+0.083 lift)
- Targeting random: final cooperation 0.270 vs hub targeting compiled: 0.333
- Targeting periphery compiled: 0.307 final cooperation vs hub targeting: 0.333
Figures from the paper
Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2606.11050.

Fig 1: Experiment 1: Compiled versus unstructured versus no-influence conditions with grounded hybrid agents.

Fig 2: Compiled (blue) versus unstructured (red dashed) influence under idealized and grounded agent substrates.

Fig 3: Experiment 3: Cooperation rate by targeting strategy over time (left) and final cooperation rate comparison

Fig 4: Compiled (blue) versus unstructured (red dashed) directives across resistance levels R ∈{0.1, 0.3, 0.5, 0.7}.

Fig 5: decomposes the population-level results into agent-level heterogeneity. The left panel confirms that
Limitations
- Small simulated population size (N=30), limiting generalizability to large-scale microgrids.
- Single network realization per condition—no reported variation across multiple graph samples or network topologies beyond scale-free.
- Use of LLMs to proxy human prosumer behavior without validation in human-in-the-loop or real-world settings.
- Resistance modeled as fixed agent-level scalar—dynamic or context-dependent resistance not explored.
- Only limited agent personality attributes modeled; realism of archetypes and payoffs uncertain.
- Potential biases or artifacts from using Llama-3.3-70B-Instruct and API calling environment not fully characterized.
Open questions / follow-ons
- How do results scale with larger populations and multiple heterogeneous microgrid topologies (e.g., small-world, random graphs)?
- How well do hybrid LLM-agent coordination models align with actual human prosumer behavior in controlled experiments or field pilots?
- Can adaptive or dynamic resistance models reflecting agent learning improve realism and coordination outcomes?
- What governance frameworks or constitutional AI constraints effectively prevent manipulative or inequitable influence by the LLM compiler in practice?
Why it matters for bot defense
From a bot-defense and CAPTCHA perspective, this paper illustrates a sophisticated LLM-mediated multi-agent influence paradigm, blending strategic reasoning with large language model narrative evaluation to affect behavior in networked populations. The hybrid approach mitigates the universal cooperation bias seen when using RLHF-aligned LLMs as direct decision-makers by separating base strategic reasoning from narrative persuasion shifts.
The insights around structured versus unstructured messaging, network-aware targeting, and resistance attenuation may inspire new thinking about designing influence signals or challenge-response mechanisms that balance interpretability, strategic rationality, and heterogeneity in human or bot populations. The finding that unstructured natural language messaging can backfire suggests careful message design and possibly schema enforcement may be critical to effective coordination or bot-containment.
Finally, the use of scale-free network topology and targeting of network hubs for maximal influence parallels botnet command and control or Sybil attack dynamics, providing a useful conceptual framework for defenders to study influence propagation and targeted mitigation strategies. The hybrid model architecture itself could inform design of challenge schemata that blend automated reasoning and natural language to detect or influence adversarial actors appearing cooperative due to strong priors but may behave strategically.
Cite
@article{arxiv2606_11050,
title={ LLM-Mediated Demand Response Coordination in Smart Microgrids },
author={ J. de Curtò and I. de Zarzà },
journal={arXiv preprint arXiv:2606.11050},
year={ 2026 },
url={https://arxiv.org/abs/2606.11050}
}