Cepstral Analysis to accelerate Green-Kubo thermal conductivity calculations of Metal-Organic Frameworks

Source: arXiv:2606.13588 · Published 2026-06-11 · By Florian P. Lindner, Egbert Zojer, Sandro Wieser

TL;DR

This paper addresses the computational challenge of accurately and efficiently predicting thermal conductivities of metal-organic frameworks (MOFs), which are porous materials with complex hybrid organic-inorganic structures important for gas storage and separation technologies. Traditional equilibrium molecular dynamics (MD) simulations using the Green-Kubo (GK) formalism are known to suffer from high statistical noise and require careful, often system-specific user-defined parameters to extract converged thermal conductivity values. Such challenges impair reproducibility and automation. The authors propose augmenting GK simulations with cepstral analysis, a signal processing technique that transforms the logarithm of the power spectrum to effectively separate noise from meaningful spectral features. Applied to three benchmark MOFs (MOF-5, HKUST-1, and ZIF-8) modeled using machine-learned moment tensor potentials (MTPs) trained on density functional theory (DFT) data, the cepstral method reduces the total sampling time needed to 1-2 ns for convergence and yields stable results largely independent of ad hoc parameter choices. This demonstrates that cepstral analysis enables automated, robust, near-ab initio quality predictions of low thermal conductivity in complex MOFs, overcoming the statistical noise bottleneck in GK heat transport calculations.

Key findings

Cepstral analysis applied to GK simulations reduces required sampling times to ~1-2 ns total trajectory length versus tens of ns commonly needed with direct GK analysis.
Cepstral approach produces thermally conductivity estimates stable across a wide range of chosen correlation lengths and analysis window sizes, unlike direct GK integrals which show erratic convergence (Fig 7).
Machine-learned MTPs trained on ~1000-1800 DFT reference structures per MOF achieve force RMSEs of 41-66 meV/Å compared to DFT across 100-500 K validation sets.
Conventional GK heat flux autocorrelation functions (HFACFs) for MOFs decay rapidly (<10 ps), but noise dominates beyond causing integration to behave like a random walk without clear plateau (Figs 3, 4).
Statistical noise floor estimated by off-diagonal heat flux cross-correlations (W(t)) matches well the observed fluctuations in HFACFs, confirming noise dominance.
Cepstral method estimates thermal conductivity as the zero-frequency limit of the spectral density (power spectrum) of heat flux, overcoming instabilities in time-domain integration.
Applying cepstral analysis to MOF-5, HKUST-1, and ZIF-8 yields thermal conductivities consistent with available experimental values, demonstrating accuracy of combined ML potential and cepstral-GK simulation workflow.

Threat model

The work treats the statistical noise inherent to finite-length molecular dynamics simulations as the main 'adversary' to accurate thermal conductivity estimation. The limitations stem from finite sampling, ergodicity assumptions, and stochastic fluctuations of the heat flux. There is no adversary who can manipulate data maliciously; rather, the focus is on robust estimation under uncertainty and noise.

Methodology — deep read

Threat Model & Assumptions: The adversary in this context is the statistical noise and finite sampling limitations inherent in molecular dynamics simulations of heat transport in MOFs. The assumptions include ergodicity of the system and that thermal conductivity can be derived from equilibrium heat flux fluctuations via the Green-Kubo formalism. No malicious adversarial actors are considered, but rather the challenge is the stochastic uncertainty in computed autocorrelations.
Data: Reference DFT data was generated using VASP with PBE+D3BJ functional, Γ-point sampling, and a 900 eV cutoff. Active learning generated ~974 (MOF-5), 1431 (HKUST-1), and 1809 (ZIF-8) DFT-evaluated atomic configurations from MD at temperatures ramped from 50 K to 900 K. A validation dataset of 450 DFT structures sampled from NpT ensemble trajectories at 100, 300, and 500 K was used to evaluate the MTP accuracy.
Architecture / Algorithm: Moment Tensor Potentials (MTPs) were employed as machine-learned interatomic potentials, representing total energy as a weighted combination of moment tensor descriptors of local atomic environments. MTP models used level parameter 18 and radial basis size 12, optimized by minimizing a weighted sum of squared errors in energy (weight=1.0), forces (0.01), and stresses (1.0). Separate parameter sets were maintained for distinct atomic environments. Ten independent MTP models were trained; the best mix of accuracy and stability was selected.
Training Regime: Active learning was performed 'on the fly' during MD at variable temperatures, triggering DFT calculations for atomic environments with high Bayesian error estimates. Subsequently, passive retraining built the final MTPs. Training runs used default parameters of MLIP-2 package; training seeds and model selection detailed in SI.
Evaluation Protocol: Green-Kubo MD simulations used LAMMPS with a timestep of 0.5 fs. Supercells of 3x3x3 conventional unit cells were used for convergence. Ten independent, 1 ns NVE trajectories at 300 K followed NVT equilibration. Heat fluxes were computed using the corrected Hardy formalism adapted for many-body potentials (MTP). Power spectral density of heat flux was analyzed via cepstral methods using a specialized code (GK_analysis). Validation included comparison of force RMSEs to DFT and testing different analysis window sizes and smoothing window parameters.
Reproducibility: Code for cepstral analysis is publicly available (from co-author), while ML potentials rely on a combination of open-source codes (VASP, LAMMPS, MLIP-2). The datasets are DFT reference data generated internally and presumably not fully public. Full reproduction requires access to these datasets and training pipelines. Hyperparameter values and methodological details are well documented.

Example: For MOF-5, a 1 ns equilibrium MD trajectory yields heat flux time series sampled at 0.5 fs intervals. The heat flux autocorrelation function (HFACF) decays within ~10 ps but shows large noise beyond. Direct integration of HFACF to yield thermal conductivity exhibits random-walk behavior due to noise. Applying cepstral analysis transforms the heat flux power spectrum into the cepstral domain, enabling separation of noise components and stable estimation of zero-frequency spectral value, thus converged thermal conductivity with uncertainty estimates using ~1-2 ns data. This process is parameter-free aside from data length and produces robust results across independent runs, unlike traditional integration methods requiring heuristic smoothing, windowing, and plateau choice.

Technical innovations

Application of cepstral analysis to Green-Kubo heat flux data of low-thermal-conductivity MOFs, enabling robust, noise-mitigated thermal conductivity estimates with substantially reduced sampling time versus prior methods.
Integration of machine-learned moment tensor potentials trained via active learning on DFT data for MOF heat transport MD simulations providing ab initio-level accuracy at much lower computational cost.
Demonstration that cepstral analysis removes critical dependence on user-defined ad hoc parameters such as smoothing windows and integration cutoffs, improving reproducibility and facilitating automation.
Use of corrected Hardy heat flux formalism adapted for many-body machine-learned potentials ensuring physically consistent flux and thermal conductivity calculations.

Datasets

MOF-5 DFT training set — ~974 structures — generated via active learning from ab initio MD
HKUST-1 DFT training set — ~1431 structures — generated via active learning from ab initio MD
ZIF-8 DFT training set — ~1809 structures — generated via active learning from ab initio MD
Validation set — 450 structures sampled from MD trajectories at 100K, 300K, 500K with corresponding DFT single-point calculations

Baselines vs proposed

Direct Green-Kubo integral (1 ns trajectory): thermal conductivity estimate fluctuates without convergence vs. cepstral analysis converges within 1-2 ns
Force RMSE for MTP models vs DFT at 300K: MOF-5 = 47 meV/Å, HKUST-1 = 41 meV/Å, ZIF-8 = 66 meV/Å, indicating high accuracy of ML potentials

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2606.13588.

Fig 1

Fig 1: Structures of the investigated MOFs together with their experimental conventional

Fig 2

Fig 2: Validation of the MTPs used to model the three MOF systems against a test set

Fig 3

Fig 3: Example for the time averaged HFACF of MOF-5 calculated using eq. (5) for a full

Fig 4

Fig 4: Numerical integration of the HFACF obtained from the 1 ns trajectory for MOF-5,

Fig 5

Fig 5: Power spectrum of the HFACF for MOF-5 as provided in Figure 3. The blue line

Fig 6

Fig 6: Akaike information criterion 𝐴𝐼𝐶𝑐 (a) with global minimum (black dashed line) and

Fig 7

Fig 7: Time convergence behavior of the calculated thermal conductivity of MOF-5 obtained

Fig 8

Fig 8: Analysis of the influence of the extraction point and the -smoothing window width

Limitations

ML potentials trained on specific MOF structures with fixed chemical environments — transferability to other MOFs or guest-loaded systems untested.
Validation limited to three prototypical MOFs; generality across broad MOF families or disordered structures remains to be demonstrated.
Cepstral analysis performance for materials with higher thermal conductivity (slowly decaying autocorrelations) may be less straightforward and is not deeply analyzed here.
Reproducibility depends on access to trained ML potentials and DFT reference datasets that are not publicly disclosed.
Simulations do not explicitly include guest molecules or adsorbates, which may affect heat transport and introduce convection complicating flux analysis.

Open questions / follow-ons

How well does cepstral analysis generalize to MOFs with guest molecules or dynamic adsorbates influencing heat flux via convection?
Can the cepstral-GK framework be extended or adapted to materials with higher thermal conductivity and longer autocorrelation times without loss of effectiveness?
What is the impact of structural defects, disorder, or amorphous phases typical in practical MOFs on cepstral analysis performance?
How transferable are the trained MTP potentials across different MOFs, and can ML potential training be further automated to support high-throughput thermal transport screening?

Why it matters for bot defense

While this paper does not focus on security or CAPTCHA problems, it offers insights for bot-defense practitioners interested in managing noise and uncertainties in automated measurement or classification tasks. The use of cepstral analysis to separate signal from noise in time series data provides a principled, parameter-minimal approach to improve robustness and reproducibility, which is critical in automated systems requiring minimal human tuning.

For CAPTCHA engineers, the lesson is that replacing heuristic, user-defined smoothing parameters with approaches grounded in signal processing and statistical modeling can facilitate more reliable detection or classification in noisy continuous streams. Techniques like cepstral analysis may find analogous applications in bot activity time series or network traffic anomaly detection where noise complicates direct integration or thresholding methods.

Cite

bibtex

@article{arxiv2606_13588,
  title={ Cepstral Analysis to accelerate Green-Kubo thermal conductivity calculations of Metal-Organic Frameworks },
  author={ Florian P. Lindner and Egbert Zojer and Sandro Wieser },
  journal={arXiv preprint arXiv:2606.13588},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.13588}
}

Cepstral Analysis to accelerate Green-Kubo thermal conductivity calculations of Metal-Organic Frameworks ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​