Skip to content

CottonLeafVision: An Explainable and Robust Deep Learning Framework for Cotton Leaf Disease Classification

Source: arXiv:2606.14686 · Published 2026-06-12 · By Rafi Ahamed, Md. Abir Rahman, Tasnia Tarannum Roza, Munaia Jannat Easha, Md. Asif Khan, Sudeepta Mandal

TL;DR

This paper addresses the problem of accurately classifying cotton leaf diseases using deep learning, a critical task for improving cotton crop health and supporting agricultural economies. The authors present CottonLeafVision, a framework that evaluates and compares pretrained convolutional neural networks (DenseNet201, InceptionV3, VGG19) on a seven-class cotton leaf disease dataset encompassing six diseases plus healthy leaves collected under real-world field conditions. Among the models, DenseNet201 achieved the best classification accuracy at 98%, outperforming previous work. To boost reliability and interpretability, they applied adversarial training to enhance noise robustness, and used Grad-CAM and occlusion sensitivity methods to provide visual explanations of model decisions. A user-friendly web app prototype demonstrates practical deployment potential.

The results indicate that DenseNet201 not only provides state-of-the-art accuracy surpassing prior approaches (e.g. 96% with InceptionV3 in baseline), but also shows robustness to adversarial perturbations with accuracy sustained above 98.8% under tested noise levels. Explainability techniques successfully identify disease-affected leaf regions critical to predictions, increasing user trust and applicability in real agricultural scenarios. The paper thus simultaneously tackles accuracy, robustness, and explainability challenges for cotton leaf disease classification.

Key findings

  • DenseNet201 achieved a top test accuracy of 98% on the SAR-CLD-2024 cotton leaf disease dataset, outperforming InceptionV3 (97%) and VGG19 (93%).
  • Adversarial training on DenseNet201 maintained validation accuracy above 98.8% across epsilon perturbations from 0.1 to 0.2, peaking at 99.06% at ε=0.18 and 0.2 (Table II).
  • Precision, recall, and F1-score for DenseNet201 across disease classes ranged from 0.95 to 1.00, with Heritage Growth Damage and Leaf Variegation reaching perfect (1.00) scores (Table III).
  • Grad-CAM visualizations highlighted infected leaf regions aligning with known disease symptoms, confirming model focus on relevant features (Fig. 9).
  • Occlusion sensitivity analysis showed accuracy declines when critical damaged leaf areas were masked, supporting the interpretability of model predictions.
  • Dataset comprised 2,137 original images expanded through augmentation to 7,000, with 70%-20%-10% train-validation-test splits and image resizing to 224x224 pixels.
  • DenseNet201 training used Adam optimizer at 0.0001 learning rate, batch size 32, up to 50 epochs with early stopping (patience 10).
  • Compared to prior cotton leaf disease models: Bishshash's InceptionV3 (96.03%), Abudukelimu's CM-YOLO (93.3%), and Kaur's VGG16 (95.5%), CottonLeafVision improved accuracy by 2-5 points.

Threat model

The adversary is assumed to be input noise or perturbations which might arise from environmental or imaging variations that could degrade classification performance. The adversarial training experiments simulate these perturbations as bounded attacks with epsilon ranging up to 0.2. The adversary cannot modify ground truth labels or conduct more sophisticated attacks such as data poisoning or evasion beyond these noise levels.

Methodology — deep read

  1. Threat model and assumptions: The study assumes a standard classification scenario where the adversary could be noise or perturbations in real-world cotton leaf images potentially confusing the classifier. The adversary’s capabilities are limited to perturbations modeled via adversarial training; no active adversarial attacks like poisoning or evasion are addressed.

  2. Data: The dataset is SAR-CLD-2024, a publicly available cotton leaf disease image set containing 2,137 images spanning seven classes (six diseases plus healthy leaves), collected under varied field conditions. The dataset was augmented to 7,000 images by standard image transformations (not fully detailed). Images were resized to 224x224 pixels and normalized to [0,1]. The dataset was split into train (70%), validation (20%), and test (10%) subsets. Class imbalance in training was addressed through oversampling.

  3. Architectures: Three pretrained CNNs were evaluated—DenseNet201 (201 layers with dense connections), InceptionV3 (48 layers with inception modules), and VGG19 (19 layers with small 3x3 conv filters). Models were fine-tuned on the cotton dataset, adapting their final classification layers to 7 classes.

  4. Training: All models used Adam optimizer with learning rate 0.0001, batch size 32, and categorical cross-entropy loss suitable for multi-class classification. They were trained for up to 50 epochs with early stopping patience of 10 epochs to prevent overfitting. Training specifics such as hardware or random seeds are not detailed.

  5. Evaluation: Models were assessed on accuracy, precision, recall, and F1-score. Confusion matrices were examined per model to analyze class-wise performance and misclassifications. Adversarial robustness for DenseNet201 was tested via adversarial training with varying epsilon perturbations from 0 to 0.2. Explainability was evaluated visually using Grad-CAM and occlusion sensitivity analysis to validate that the model focuses on diseased leaf regions.

  6. Reproducibility: The dataset used is publicly available. There is no mention of code, trained weights, or hyperparameter configuration releases, limiting immediate reproducibility.

Example end-to-end: DenseNet201 was initialized with ImageNet pretrained weights, final layers adapted for 7 output classes. Images resized and normalized were fed into the model. Training proceeded with Adam at 0.0001 learning rate, loss minimized via categorical cross-entropy, batch size 32, up to 50 epochs with early stopping. After training, the model achieved 98% accuracy on test data. Adversarial training was performed by introducing input perturbations with ε up to 0.2, maintaining high accuracy above 98.8%. Grad-CAM heatmaps were generated by backpropagating class scores to input spatial locations, highlighting infected regions on leaf images for explainability confirmation.

Technical innovations

  • Integrated adversarial training with DenseNet201 for cotton leaf disease detection to improve noise robustness, surpassing previous CNN approaches that focused mainly on accuracy.
  • Combined Grad-CAM and occlusion sensitivity analyses to provide a dual-method explainability framework demonstrating model focus on disease-affected leaf regions.
  • Developed a lightweight web-based prototype application (CottonLeafVision) enabling real-time deployment of the DenseNet201-based classifier with visual explanations for field usability.
  • Performed comprehensive evaluation across three pretrained CNN architectures on the SAR-CLD-2024 dataset, establishing DenseNet201 as a superior model balancing accuracy and interpretability.

Datasets

  • SAR-CLD-2024 — 2,137 original images, augmented to 7,000 images — publicly available via Mendeley Data (doi:10.17632/b3jy2p6k8w.2)

Baselines vs proposed

  • Bishshash (InceptionV3): Accuracy = 96.03% vs DenseNet201: 98%
  • Abudukelimu (CM-YOLO): mAP50 = 0.933 vs DenseNet201: Accuracy 98%
  • Kaur (VGG16): Accuracy = 95.5% vs DenseNet201: 98%
  • DenseNet201: Precision/Recall/F1 scores between 0.95-1.00 vs InceptionV3: 0.91-1.00, VGG19: 0.84-0.99 (class dependent)

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2606.14686.

Fig 1

Fig 1: shows the whole work process of the proposed

Fig 2

Fig 2: Sample images from the dataset

Fig 3

Fig 3: illustrates the training and validation performance

Fig 4

Fig 4: Confusion Matrix of VGG19 architecture

Fig 5

Fig 5: Loss for training and validation of the InceptionV3 architecture

Fig 6

Fig 6: Confusion Matrix of InceptionV3

Fig 7

Fig 7: Training and validation accuracy and loss curve of DenseNet201

Fig 8

Fig 8: Confusion Matrix of DenseNet201

Limitations

  • Dataset size is relatively small (2,137 original images) despite augmentation, which may limit model generalization.
  • Evaluation is limited to a single publicly available dataset without external multi-center or cross-device validation, restricting generalizability.
  • Adversarial robustness was tested only for input noise perturbations up to epsilon 0.2; robustness against other realistic domain shifts or adversarial attacks was not evaluated.
  • Augmentation and oversampling methods are not described in detail, making it unclear how class imbalance was addressed fully.
  • No information on hardware or random seeds used in training, which can impact reproducibility.
  • Explainability evaluation is qualitative and limited to Grad-CAM and occlusion sensitivity; no quantitative user studies or uncertainty quantification were performed.

Open questions / follow-ons

  • How does CottonLeafVision generalize to other cotton datasets collected across different geographic regions or imaging conditions?
  • Can uncertainty quantification methods be integrated to estimate model confidence when deployed in the field?
  • What is the model performance against physically realizable adversarial attacks or sensor distortions common in agricultural settings?
  • How usable and interpretable are the Grad-CAM and occlusion sensitivity explanations for farmers and agricultural experts in practice?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, CottonLeafVision provides a compelling example of balancing high accuracy with robustness and explainability when deploying machine learning models in real-world noisy environments. The use of adversarial training to harden DenseNet201 against input perturbations parallels the need for robust defenses in security-sensitive applications. The integration of explainability techniques, Grad-CAM and occlusion sensitivity, illustrates practical methods to audit and understand model behavior beyond simple performance metrics, supporting trust in automated decisions.

Although this paper targets agricultural disease classification, the demonstrated modeling approach—pretrained CNNs, adversarial robustness training, and comprehensive explainability—is directly transferable to bot detection or CAPTCHA analysis pipelines where reliability under attack and interpretable decisions are crucial. The dataset augmentation and careful validation protocols also offer instructive methodologies for practitioners interested in robust classifier design under distribution shifts or noisy inputs.

Cite

bibtex
@article{arxiv2606_14686,
  title={ CottonLeafVision: An Explainable and Robust Deep Learning Framework for Cotton Leaf Disease Classification },
  author={ Rafi Ahamed and Md. Abir Rahman and Tasnia Tarannum Roza and Munaia Jannat Easha and Md. Asif Khan and Sudeepta Mandal },
  journal={arXiv preprint arXiv:2606.14686},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.14686}
}

Read the full paper

Articles are CC BY 4.0 — feel free to quote with attribution