Cloud Computing Review: A Decade of Research
Source: arXiv:2605.24499 · Published 2026-05-23 · By Smruti Rekha Swain
TL;DR
This paper presents a large-scale scientometric analysis of cloud computing research published over the last decade (2014-2023), using 12,638 articles sourced from the Web of Science Core database. By leveraging the CiteSpace tool for co-citation, co-word, and collaboration network analyses, the study reveals the structure, trends, and influential contributors within the cloud computing domain. The paper identifies five main research domains—Mobile Edge Computing, Fog Computing, Secure Cloud Storage, Workflow Scheduling, and Federated Learning—and analyzes their publication growth, geographical distribution, and key topic clusters.
Compared to prior smaller-scale scientometric reviews focused on earlier periods or narrower topics, this study covers a vastly larger dataset (12,638 articles vs roughly a few thousand or less) and applies advanced clustering algorithms (like log-likelihood ratio) and network metrics (modularity, silhouette scores) to ensure reliability. Notably, federated learning is singled out as the fastest growing and currently most researched domain within cloud computing. The authors identify the most prolific authors (e.g., Rajkumar Buyya), countries (China, USA, India), and journals (IEEE Access) contributing to the field. Keyword analyses highlight resource allocation, IoT, edge computing, big data, and security as dominant research themes.
Overall, the paper advances understanding of the evolutionary patterns and knowledge structure of cloud computing research and suggests future directions including institution-level analyses and expanded scientometric methods.
Key findings
- Data set of 12,638 cloud computing articles published 2014-2023 sourced from Web of Science Core database.
- Five major research domains identified via co-citation clustering: Mobile Edge Computing (3,540 articles), Fog Computing (2,089), Secure Cloud Storage (1,366), Workflow Scheduling (1,407), Federated Learning (4,133).
- Federated Learning is the most researched domain by volume and exhibited the highest growth rate, increasing from 7 publications in 2014 to 1,791 in 2023 (~46% of domain publications).
- Country collaboration network included 127 nodes and 1,211 links, with China leading in publication count (5,616) but low centrality (0.03), while England (0.16) and Saudi Arabia (0.14) showed higher quality (centrality) despite fewer publications.
- Top cited authors include Rajkumar Buyya (83 citations), Kim-Kwang Raymond Choo (71), and Ximeng Liu (67), with centrality scores up to 0.17 indicating key connectors in author networks.
- Top cited journals by citation count are IEEE Access (5,154 citations) and Future Generation Computer Systems (5,046), with highest impact factors among core journals being IEEE Communications Surveys and Tutorials (35.6).
- Keyword frequency analysis shows prominent topics: 'cloud computing' (5,713 occurrences), 'edge computing' (1,329), 'internet of things' (1,261), 'resource allocation' (792), 'fog computing' (791), 'big data' (591), and 'security' (531).
- Document co-citation network (1,422 nodes, 6,929 links) with modularity Q=0.6876 and silhouette score S=0.8686 indicates reliable clustering of underlying research topics.
Methodology — deep read
Threat Model & Assumptions: Not applicable as this is a bibliometric and scientometric meta-analysis. The study assumes that scientific activity and research trends in cloud computing can be quantitatively measured by analyzing publications and citations.
Data: The authors retrieved publications from Clarivate's Web of Science Core Collection from 2014 through 2023 using the query ALL('Cloud Computing') combined with subqueries for specific domains. From an initial 69,142 records, they filtered by document type ('Article'), language (English), subject area ('Computer Science, Information Systems'), and removed duplicates using CiteSpace to finalize a dataset of 12,638 research articles. The dataset thus represents peer-reviewed publications in scholarly journals and conferences relevant to cloud computing.
Architecture / Algorithm: The core technique is scientometric analysis via visualization and network modeling using the CiteSpace software. The main methods include document co-citation analysis to identify research clusters, author co-citation analysis to reveal influential researchers, journal co-citation to find prominent publication outlets, and co-word (keyword) analysis to track hot topics. Clustering uses log-likelihood ratio (LLR) algorithms optimized for high intra-cluster similarity. Modularity (Q) and silhouette (S) scores are computed to evaluate cluster quality.
Training Regime: Not applicable; no machine learning training involved. The analysis parameters in CiteSpace were tuned to extract top 50 items per slice and threshold for cluster identification according to best practice for citation burst detection and cluster validity.
Evaluation Protocol: Scientometric metrics analyzed include publication counts per year by domain, country collaboration networks (nodes = countries, edges = co-publishing links), centrality (betweenness centrality) as a proxy for influence/quality, citation frequency of authors, journals, and papers, cluster modularity and silhouette for validation, and keyword co-occurrence frequencies with citation burst detection to identify emerging trends. The results include annual temporal trends, density of co-citation networks, cluster thematic tags, and collaboration patterns.
Reproducibility: The study relies on the proprietary Web of Science database and the paid CiteSpace 6.2 R6 software. The detailed WoS queries used to segment by research domain and the filtering approach are well documented, but there is no code release or public dataset snapshot. Publication records are accessible via subscription to WoS. Data preprocessing including deduplication and cluster generation was done within CiteSpace. The lack of public code or frozen models limits full reproducibility.
Example End-to-End: The authors query WoS with 'Cloud Computing' restricted to computer science articles published 2014-2023, yielding ~69k records. Filtering on language and article type narrows this to ~12.6k articles. They input these into CiteSpace to generate co-citation networks. Using LLR, they identify clusters representing major domains such as federated learning (#5 cluster). They analyze temporal trends of publications per domain, generate country collaboration graphs to measure influence and connectivity, and extract keyword bursts (e.g. 'big data', 'IoT') to highlight emerging topics. The clustering reliability is quantified via modularity Q=0.6876 and silhouette S=0.8686. Top authors, papers, and journals are ranked by frequency and centrality metrics, revealing key contributors and outlets. This structured approach provides a comprehensive map of cloud computing research evolution over a decade.
Technical innovations
- Application of a large-scale, decade-long scientometric analysis (12,638 articles) of cloud computing research from 2014-2023 using the WoS Core database.
- Use of CiteSpace's log-likelihood ratio clustering combined with co-citation and co-word analyses to robustly identify and validate five main research domains within cloud computing.
- Integration of country collaboration networks with publication counts and centrality measures to characterize both quantity and quality of national research contributions.
- Temporal keyword burst detection for tracking emerging research frontiers and trending topics within cloud computing domains.
Datasets
- Web of Science Core Collection — 12,638 cloud computing research articles (2014-2023) — subscription-based proprietary dataset
Baselines vs proposed
- Prior scientometric studies on cloud computing (2001-2010 to 2008-2013) analyzed at most several thousand publications vs this study analyzing 12,638 articles from 2014-2023.
- Federated Learning domain publications grew from 7 in 2014 to 1,791 in 2023 (~46% of domain's papers) compared to far slower growth in Secure Cloud Storage domain.
- Country publication counts: China 5,616 vs USA 2,060; Centrality: England 0.16 vs China 0.03 indicating quality differences despite publication volume.
- Top cited author Buyya, Rajkumar: 83 citations; next Choo, Kim-Kwang Raymond: 71 citations.
- Top cited journal IEEE ACCESS: 5,154 citations vs Future Generation Computer Systems: 5,046; Impact factor highest for IEEE Communications Surveys and Tutorials (35.6).
- Document co-citation network had density 0.0069, modularity Q=0.6876, silhouette S=0.8686 reflecting reliable cluster quality.
Figures from the paper
Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2605.24499.

Fig 1: Research Methodology

Fig 2: Prisma Flow Diagram for Record Selection

Fig 3 (page 2).

Fig 4 (page 2).

Fig 3: Document Co-citation Analysis

Fig 6 (page 4).

Fig 7 (page 4).

Fig 8 (page 4).
Limitations
- Analysis limited to publications indexed in the Web of Science Core Collection, excludes gray literature, non-English papers, and possibly newer open-access venues.
- Use of CiteSpace (paid software) without public release of processed data or code limits reproducibility by external researchers.
- No qualitative assessment of paper content, impact beyond citations, or challenge-specific detailed topic modeling.
- No evaluation of distribution shifts beyond temporal slices or of the robustness of cluster assignments across parameter changes.
- Country-level geographic analysis aggregates institutional diversity, potentially masking variations in regional collaboration or innovation quality.
Open questions / follow-ons
- How do institutional collaboration networks within countries influence cloud computing research innovation and knowledge diffusion?
- Can additional data sources beyond WoS (e.g., Scopus, arXiv) be integrated to provide a more comprehensive and inclusive scientometric picture?
- How can semantic or content-based topic models complement co-citation clustering to better reveal nuance in cloud computing subfields?
- What are the impacts of recent open-access publishing trends on citation patterns and research dissemination in cloud computing?
Why it matters for bot defense
For bot-defense and CAPTCHA practitioners, this paper outlines the evolving research landscape of cloud computing, which is increasingly relevant for scalable and secure backend infrastructure supporting CAPTCHA services. Understanding dominant trends like federated learning and edge/fog computing helps inform how tasked content or bot analysis can be distributed closer to users, improving responsiveness and privacy. The identification of top authors and journals can guide deeper dives into technical advancements applicable to defense mechanisms. Although not directly about bot or CAPTCHA defenses, the insights into resource allocation, security, and workflow scheduling domains intersect with cloud infrastructure critical to state-of-the-art CAPTCHA deployment and attack mitigation. Furthermore, the growth and collaboration maps indicate where global research hubs are located, offering potential partnership or intelligence sources for innovation in bot defense. Practitioners should consider this landscape to anticipate emerging cloud technologies that may impact CAPTCHA reliability and abuse detection scalability.
Cite
@article{arxiv2605_24499,
title={ Cloud Computing Review: A Decade of Research },
author={ Smruti Rekha Swain },
journal={arXiv preprint arXiv:2605.24499},
year={ 2026 },
url={https://arxiv.org/abs/2605.24499}
}