Static Attribution of Android Residential Proxy Malware Using Graph Kernels

Source: arXiv:2604.27302 · Published 2026-04-30 · By Peter Clark, Yong Guan, Zhonghao Liao

TL;DR

This paper addresses the challenging problem of statically attributing Android residential proxy malware APKs to specific commercial proxy networks. These proxy applications, often embedded as monetization SDKs within benign host apps, covertly route traffic for malicious purposes such as ad fraud and credential abuse. Prior research focused mainly on network-level measurements and string-based signatures, which struggle with obfuscation, code reuse, and shared third-party libraries. The authors present a novel static-analysis pipeline that extracts structural graph representations—control-flow graphs (CFGs) and function-call graphs (FCGs)—from APK Dalvik bytecode and applies Weisfeiler-Lehman (WL) graph kernels to encode these graphs into fixed-dimensional feature vectors. Combining these graph features with behavioral capability vectors extracted from native code with the capa tool, they train multiple classifiers using a carefully constructed 5-fold cross-validation split grouped by DEX file hash reuse to prevent leakage. The best classifier, SGD with linear SVM loss, achieves a macro F1 of 0.985 for a four-family attribution task over an expanded dataset of 3,365 APKs. They further generate automatically derived Yara rules from classifier feature importance to enable explainable attribution with per-family accuracies up to 88.45%. An open-world detection setting including 1,000 arbitrary non-proxy apps yields a macro F1 of 0.963. Their dataset expansion technique leveraging DEX reuse enumeration doubles the labeled corpus size, enabling a more robust evaluation. Finally, they analyze live APKPure samples to find that over half still embed proxy SDKs and link 23 developer accounts to multiple proxy-containing apps, suggesting ongoing commercial relationships. This work demonstrates how structural static analysis combined with graph kernel learning can provide highly accurate, explainable family attribution of Android residential proxy malware despite pervasive code reuse and obfuscation challenges.

Key findings

Dataset expanded from 1,629 to 3,365 APKs by enumerating shared DEX files using VirusTotal pivots (Table 1).
SGD classifier with WL graph kernel features (256 dimension, 2 iterations) achieves macro F1 = 0.985 on four-class proxy family attribution, using 5-fold DEX-grouped cross-validation (Table 4).
Adding behavioral capability vectors from native code analysis (capa tool) produces a 546-dimensional fused feature vector that improves fidelity.
Open-world detection including 1,000 non_proxy APKs yields macro F1 = 0.963 for five-class classification (4 proxy families + non-proxy).
Automatically generated Yara rules from classifier explanations achieve per-family detection accuracies up to 88.45% after removing non-discriminative signatures (Figure 7).
51.4% of APKs currently available from APKPure still embed residential proxy SDK code, based on inference from the detection model.
23 developer accounts submitted multiple APKs containing proxy SDKs, indicating ongoing commercial linkages between proxy providers and app developers.
Random Forest and XGBoost classifiers underperform SGD slightly, suggesting the feature representation is well suited to linear models.

Threat model

The adversary is a residential proxy malware operator embedding proxy SDKs within Android host applications and distributing them covertly. They attempt to evade detection by reusing code, embedding third-party libraries, and applying obfuscation techniques. The defender’s goal is static analysis attribution based solely on APK bytecode, without relying on network-level identifiers which are obfuscated by the proxy backconnect architecture. The adversary cannot easily alter the fundamental structural graph properties captured by Weisfeiler-Lehman kernels without significant engineering effort.

Methodology — deep read

The authors start from a labeled corpus of 3,365 residential proxy APKs, expanded from the original Mi et al. dataset by enumerating shared DEX file hashes across applications. This dataset comprises four known proxy families: IPNinja, Luminati, Monkeysocks, and Oxylabs, plus a non_proxy class for detection experiments.

Threat Model and Assumptions: The adversary is a proxy malware operator distributing SDK-embedded proxy code across many host APKs, utilizing code reuse, obfuscation, and shared third-party libraries. The defender does not rely on network-level artifacts which are opaque due to backconnect proxying. Instead, the defender attempts static attribution based on structural program properties. They assume no prior knowledge of the target APK other than its raw binary.
Data: They use APKs collected by Mi et al., labeling based on network connections to backconnect infrastructure. They expand the dataset by pivoting on identical DEX file hashes to discover additional APKs sharing proxy SDK code. APKs undergo filtering to exclude standard library and common third-party methods by package prefix to isolate proxy SDK logic. The largest connected component of both control-flow and function-call graphs is extracted.
Architecture and Feature Extraction: They extract per-method control-flow graphs (CFGs) and aggregate them into a single application-level CFG representing intra-method control transfers. They also extract function-call graphs (FCGs) representing inter-method call relationships. Both CFGs and FCGs are directed graphs. To vectorize these graphs, they apply a direction-aware Weisfeiler-Lehman (WL) subtree graph kernel with 2 iterations, separately aggregating predecessor and successor neighborhoods to preserve call directions. The WL labels are hashed into a fixed-dimensional feature vector of size 256 per graph type using signed hash projections to reduce collisions. This yields a 512-dimensional vector combining CFG and FCG features.
Behavioral Features: For a subset of APKs containing native code, they extract and decompile shared libraries using Ghidra and analyze them with the capa tool, which detects the presence of 34 behavioral capabilities indicative of malware functionality. These are one-hot encoded and concatenated with the graph kernel vectors to form a 546-dimensional feature vector (512 + 34). Samples lacking native code have zero vectors for these capability features.
Classifiers: They train three classifiers to evaluate these features: a linear SGD classifier with SVM loss, a Random Forest with 100 trees, and an XGBoost gradient-boosted tree ensemble. Training uses 5-fold cross-validation with grouping by connected components of shared DEX hash to ensure no DEX file appears in both training and test folds, eliminating leakage from code reuse.
Evaluation: Metrics include macro-averaged F1 score across the four proxy-family classes and a five-class open-world detection scenario adding a non_proxy class with 1,000 random APKs. They evaluate ablations on feature sets (WL-only vs WL + capa), classifiers, and weighting schemes to address class imbalance. They apply SHAP and LIME explainability methods to identify influential WL feature buckets and trace them back to code structures. These mappings support automatic generation of Yara detection rules per family, whose discriminative power is assessed.
Reproducibility: The paper notes dependence on networks and repositories like VirusTotal, APKMonk, and Androzoo for APK collection. Code for WL extraction and classification is not explicitly stated as released, but the methodology is carefully detailed for reproducibility. DEX reuse enumeration provides a general approach to dataset expansion in SDK-heavy malware domains.

End-to-End Example: An APK is disassembled with Androguard, CFGs and FCGs extracted and pruned from standard libraries, WL features computed with direction-aware neighborhood aggregation, and signed hash projection into 256-dim vectors for each graph. Concatenated, optionally appended with capa behavioral one-hot vector if native code present. The resulting vector is input to the SGD classifier trained on other APKs excluding those sharing DEX hashes with this one. The classifier outputs a family label. SHAP values reveal which WL hash buckets contributed most, enabling Yara rule synthesis targeting those code patterns for forensic triage.

Technical innovations

Application of direction-aware Weisfeiler-Lehman subtree graph kernels to combined control-flow and function-call graphs for static APK attribution.
Dataset expansion via DEX file hash reuse pivoting to discover and label thousands of additional residential proxy APKs.
Fusion of structural graph kernel features with behavioral capability vectors extracted via native code analysis and Mandiant’s capa tool.
Mapping classifier feature importances back to network-explainable code structures enabling automated Yara signature generation for explainable malware attribution.

Datasets

Mi et al residential proxy corpus — expanded from 1,629 to 3,365 APKs — collected via network backconnect patterns and VirusTotal hash enumeration.
Open-world detection dataset — 4,365 APKs combining proxy APKs plus 1,000 random non_proxy APKs — sourced from APKMonk, VirusTotal, and Androzoo.

Baselines vs proposed

SGD classifier: macro F1 = 0.985 (expanded 4-class attribution) vs Random Forest: macro F1 = 0.967 vs XGBoost: macro F1 = 0.973
WL features only: macro F1 = 0.980 vs WL + capa: macro F1 = 0.985 (SGD classifier, 4-class attribution)
Open-world 5-class classification (4 proxy families + non_proxy): macro F1 = 0.963 (SGD classifier)

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2604.27302.

Fig 1

Fig 1: Overview of the feature extraction and classification pipeline. Each APK is disassembled to extract CFG and FCG

Fig 2

Fig 2: Illustration of the two graph representations extracted from each APK. Left: control-flow graphs capture intra-method

Fig 3

Fig 3: Five-class confusion matrix on the open-world

Fig 4

Fig 4: Best-classifier macro-F1 across dataset configura-

Fig 5

Fig 5: Top 20 SHAP feature importances for the best clas-

Fig 6

Fig 6: Top 20 LIME feature importances for the best clas-

Fig 7

Fig 7: Per-family Yara rule accuracy

Limitations

Dataset is heavily imbalanced—91% of APKs come from Luminati and Monkeysocks; smallest class (IPNinja) only 35 samples.
Native code analysis behavioral features apply only to 56% of APKs that contain native shared libraries; remainder have zeroed capa vectors.
No dynamic analysis or runtime behavioral validation included; purely static analysis may miss dynamic runtime obfuscations.
Obfuscation beyond standard graph transformations may limit detection where proxy SDKs are heavily modified or polymorphic.
The expanding dataset relies on VirusTotal and APKMonk availability, which may introduce sampling bias and temporal staleness.
No adversarial robustness evaluation—attacker could conceivably alter SDK structure to evade WL kernel signatures.

Open questions / follow-ons

Can dynamic or hybrid analysis features complement graph kernels to improve robustness against obfuscation and polymorphism in these proxy SDKs?
How well does the WL kernel approach generalize to entirely unseen proxy families or variants with structurally modified SDKs?
Could adversarial machine learning techniques undermine classifier-based attribution by generating graph kernel collisions or poisoning?
What scalable, automated approaches exist to maintain Yara rule sets as proxy SDKs evolve or proliferate across new host applications?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, this paper offers a compelling example of applying structural static analysis to a class of Android malware that surreptitiously enrolls devices as residential proxies—an emerging vector for sophisticated botnets and proxy-based evasion. The demonstrated high-accuracy attribution and detection pipeline could inform defensive tooling for identifying and blocking proxy-enabled abuse on mobile endpoints. The fusion of graph-kernel features with behavioral signatures and the ability to generate explainable Yara rules provide actionable insights for signature creation and forensic triage. Moreover, the carefully designed cross-validation grouped by DEX reuse highlights how to evaluate attribution systems while avoiding evaluation leakage common in SDK-driven malware. However, the reliance on static code structure implies potential evasion by adversaries willing to significantly alter or obfuscate proxy SDK internals, so integrating runtime signals may be necessary for hardened detection. Overall, these methods provide a valuable template for advancing the bot-defense community’s capabilities to fingerprint and attribute complex multi-family proxy malware distributed via mobile ecosystems.

Cite

bibtex

@article{arxiv2604_27302,
  title={ Static Attribution of Android Residential Proxy Malware Using Graph Kernels },
  author={ Peter Clark and Yong Guan and Zhonghao Liao },
  journal={arXiv preprint arXiv:2604.27302},
  year={ 2026 },
  url={https://arxiv.org/abs/2604.27302}
}

Static Attribution of Android Residential Proxy Malware Using Graph Kernels ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​