EnThM: Energy Theft Mitigation in Smart Grids using Hierarchical Verification of Metering Data

Source: arXiv:2605.24951 · Published 2026-05-24 · By Tapadyoti Banerjee, Pabitra Mitra, Dipanwita Roy Chowdhury

TL;DR

The paper addresses the persistent problem of electricity theft in smart grids, a critical challenge affecting financial, operational, and security aspects of power distribution networks. It proposes EnThM, a novel, lightweight, and communication-efficient real-time verification scheme that leverages the hierarchical structure of smart grids to authenticate metering data at multiple distribution levels. EnThM uses statistical modeling of cumulative averages of power usage combined with rule-based checks on aggregated consumption, explicitly accounting for seasonal and daily variations in energy use. The scheme focuses primarily on cryptographic security by enabling periodic key-less authentication and real-time validation of consumption data.

The method was tested on publicly available benchmark household consumption data from the UCI repository supplemented with attack scenarios from an Industrial Control System (ICS) cyberattack dataset. Results demonstrate that EnThM successfully identifies anomalous usage patterns indicative of theft while being computationally efficient for real-time deployment. Its hierarchical verification aligns well with the smart grid architecture to scale verification from smart meters to neighborhood and control center levels. This approach contrasts with existing machine learning methods that are more computationally intensive and less suited for real-time detection, highlighting EnThM’s practical applicability for power utilities concerned with energy theft mitigation.

Key findings

EnThM tracks cumulative averages over sliding windows at current and 12 months prior to account for seasonal variation, improving detection accuracy.
Validation threshold functions r1(t) and r2(t) define adaptive current limits, with detection triggered if reported global intensity exceeds these bounds (Eq. 7).
Using UCI household consumption data from 2006–2010 (over 2 million measurements), the system detected synthetic forged current spikes accurately without false positives in nominal periods.
Rate-of-change parameter α is calculated from the mode of transformed current values within overlapping current windows, enabling dynamic threshold adjustment.
Hierarchical verification matches smart grid architecture: each smart meter’s data is verified by its parent node (HAN->BAN->NAN->Control Center), reducing communication and computation overhead at the control center.
The approach avoids computationally heavy ML models, enabling efficient real-time theft detection on streaming smart meter data.
Experimental results on a selected 24-hour window show global intensity values maintained within thresholds in normal conditions, and deviations flagged with corresponding anomaly scores >1 indicating theft.

Threat model

Adversaries are cyber attackers capable of intercepting, forging, replaying, and injecting falsified smart meter messages across wireless communication channels within the smart grid hierarchy. They aim to manipulate consumption data to hide electricity theft without physical meter tampering. The network nodes themselves (smart meters, gateways, control centers) are assumed trusted, but the communication links are adversarial.

Methodology — deep read

Threat Model & Assumptions: The adversaries are cyberattackers with capabilities including eavesdropping, message injection, replay, and forging communications between smart meters and their gateways, as conceptualized in the Dolve-Yao threat model. The attacker aims to manipulate metering data to conceal theft without direct physical tampering assumed here. The network hierarchy (Control Center->NAN->BAN->HAN->Smart Meter) is trusted, but communication channels can be intercepted.
Data: The main dataset is the UCI Machine Learning Repository's individual household electric power consumption data (~2,075,259 measurements over 47 months, 2006–2010), focusing on timestamped global intensity (current). To simulate attacks, the ICS Cyber Attack Dataset by Oak Ridge National Labs was used, containing power disturbances and cyberattack scenarios (line faults, relay attacks, command injections, data injection). The data was time-segmented in minute averages.
Algorithm and Design: EnThM computes cumulative averages of global intensity above and below a baseline current Ib, over sliding windows of length T sampled at current time ti and at ti-B (12 months prior), capturing seasonal and daily variations. Two functions R1(ti) and R2(ti) represent averages below and above Ib, respectively. Using a dynamically calculated rate-of-change parameter α (derived via a modal statistic of current deviations normalized against IMAX), threshold functions r1(ti) and r2(ti) are defined. These provide adaptive lower and upper bounds on expected current. Incoming readings GIti+T+1 from smart meters are verified against [r1, r2] intervals based on the cumulative distribution function of uniform distribution—outside this interval flags an anomaly.

The hierarchical grid structure allows recursive verification: HANs verified by BANs, BANs by NANs, NANs by Control Center. Verification is periodic and key-less after initial authentication.

Training & Parameter Setting: This method is statistical and rule-based, so it requires no ML training. Parameters a (lower bound), b=IMAX=30 Amps (upper bound), and Ib=5 Amps (basic current reference) are set according to industry standards (IEC 61036). Window length T and window shift are chosen empirically to balance responsiveness and smoothness.
Evaluation Protocol: The system was evaluated over a 24-hour window, followed by a test case injection of a forged current value exceeding r2 threshold (25 Amps vs threshold 15.5 Amps). Detection decision based on Eq. 7 returned invalid, demonstrating successful theft detection. Additional tests consider diurnal and seasonal variations (weekday vs weekend, summer vs winter) to ensure robustness.
Reproducibility: The authors mention simulation and FPGA-based testing but do not disclose code or weights publicly, citing security concerns of critical infrastructure. Data is from public repositories, enabling partial replication.

Example end-to-end: The system receives minute-wise global intensity GI(t) from a household smart meter. It calculates R1 and R2 averages over current and prior year windows, computes rate-of-change α, then derives adaptive bounds r1, r2. A query GI value outside [r1,r2] triggers anomaly detection and flags potential theft for provider action.

Technical innovations

Hierarchical verification leveraging smart grid topology enables scalable, multi-level authentication of metering data without repeated cryptographic key exchanges.
Dynamic thresholding using cumulative averages combined with a rate-of-change parameter computed from historical and seasonal smart meter data adapts to consumption patterns for robust anomaly detection.
Use of continuous uniform distribution's cumulative distribution function (CDF) as the basis for verification interval enables simple, interpretable statistical validation rather than opaque ML models.
Key-less, periodic authentication embedded in consumption verification allowing real-time detection without computational overhead of model retraining or extensive feature extraction.

Datasets

UCI Individual Household Electric Power Consumption Dataset — ~2,075,259 samples — public (UCI Machine Learning Repository)
Industrial Control System (ICS) Cyber Attack Dataset — size unspecified — public (Oak Ridge National Lab via Kaggle)

Baselines vs proposed

Support Vector Machine (SVM) classifiers for electricity theft detection: computationally intensive, unsuitable for real-time — EnThM: real-time, lightweight, achieves comparable detection accuracy with statistical thresholding
Double metering real-time detection [24]: high implementation cost and feasibility concerns — EnThM: low cost, key-less periodic verification on existing smart grid infrastructure

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.24951.

Fig 4

Fig 4: Comparing energy consumption patterns.

Fig 5

Fig 5: Calculation of the threshold functions r1(t) and r2(t) in the time window 10:00, 21st November, 2010 to 11:50, 21st November, 2010.

Fig 6

Fig 6: Anomaly detection at time 11:55, 21st November, 2010.

Fig 4

Fig 4 (page 9).

Fig 5

Fig 5 (page 9).

Limitations

Lack of deployment on live smart grid infrastructure limits validation under operational constraints and adversarial conditions.
Evaluation focuses primarily on detection accuracy from benchmark and simulated data; comprehensive adversarial robustness testing (e.g., adaptive attackers) is missing.
The UF CDF based threshold intervals assume the distribution of legitimate consumption fits the modeled parameters; unusual but genuine consumption patterns may cause false alarms without additional context.
Parameter choices (window length T, rate-of-change α computation) may require tuning for different locales and grid topologies.
No explicit cryptographic protocol design beyond key-less authentication claim; security is mostly statistical validation rather than cryptographic proof.
Real-time responsiveness and communication overhead not quantified in large-scale deployments.

Open questions / follow-ons

How does the approach perform under stealthy adaptive attackers who manipulate consumption gradually to stay within thresholds?
Can the hierarchical verification algorithm be extended to include explicit cryptographic authentication while maintaining communication efficiency?
What is the impact of integrating additional data sources (voltage, reactive power) into the statistical model for improved theft detection?
How would the method scale quantitatively in terms of communication overhead, computational latency, and false positive rates in large heterogeneous grids?

Why it matters for bot defense

While this work focuses on electricity theft detection in smart grids, the principle of hierarchical, multi-level verification of streamed data and adaptive thresholding based on statistical modeling is broadly applicable to bot defense in networked systems. Bot defense systems often face communication constraints similar to smart grids and require real-time anomaly detection with low false positives. The hierarchical verification pattern could inspire scalable client verification across network layers without expensive cryptographic handshakes. Furthermore, the use of interpretable statistical thresholds accommodating temporal patterns may be beneficial in detecting sophisticated bot behavior that attempts to mimic normal usage patterns with seasonal or diurnal variations. However, the domain-specific nature of the data and threat model means direct application would require modification for human-computer interaction signals typical in CAPTCHA contexts.

Cite

bibtex

@article{arxiv2605_24951,
  title={ EnThM: Energy Theft Mitigation in Smart Grids using Hierarchical Verification of Metering Data },
  author={ Tapadyoti Banerjee and Pabitra Mitra and Dipanwita Roy Chowdhury },
  journal={arXiv preprint arXiv:2605.24951},
  year={ 2026 },
  url={https://arxiv.org/abs/2605.24951}
}

EnThM: Energy Theft Mitigation in Smart Grids using Hierarchical Verification of Metering Data ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​