Atlas H&E-TME: Scalable AI-Based Tissue Profiling at Expert Pathologist-Level Accuracy

Source: arXiv:2606.12346 · Published 2026-06-10 · By Kai Standvoss, Miriam Hägele, Rosemarie Krupar, Julika Ribbat-Idel, Jennifer Altschüler, Gerrit Erdmann et al.

TL;DR

Atlas H&E-TME addresses the challenge of scalable, quantitative tissue profiling from routine hematoxylin and eosin (H&E) whole-slide images (WSIs), which are widely but qualitatively used in pathology. Prior methods either lacked broad cancer type coverage, molecular ground truth validation, or large-scale evaluation across diverse technical conditions. This work introduces an AI system built on large pathology foundation models to classify tissue quality, compartment types, and nine cell types at cell-level resolution across eight solid tumor types and common metastases, yielding over 4,500 quantitative metrics per slide. The core innovation is a dual validation approach combining (1) in-depth molecularly grounded evaluation using an IHC-informed multi-pathologist consensus protocol to overcome morphological ambiguities and inter-rater variability inherent in H&E-only annotations, and (2) large-scale in-breadth validation on 1,500+ cases with more than 200,000 high-confidence H&E annotations spanning numerous cancer types, metastatic sites, sources, and scanner models. The results show Atlas H&E-TME matches or exceeds expert pathologist H&E-only performance against IHC-informed consensus on the most morphologically ambiguous immune cell classes, while generalizing robustly across vast morphological and technical diversity. This system converts ubiquitous H&E slides into scalable, quantitative windows into tumors and their microenvironments, enabling next-generation biomarker discovery for translational and clinical research.

Key findings

IHC-informed multi-pathologist consensus improved inter-rater reliability substantially over H&E-only annotation: Krippendorff’s α increased from 0.72 to 0.85 for granulocytes, 0.74 to 0.85 for plasma cells, and 0.56 to 0.74 for macrophages (Fig 3).
Atlas H&E-TME achieved macro F1=0.74 across five cell classes against the IHC-informed consensus, matching or exceeding the pathologist mean H&E-only F1=0.71 (10k stratified bootstrap CI, Fig 4).
Allowing abstention for uncertain cell calls raised absolute F1 for both model and pathologists, with Atlas H&E-TME maintaining performance advantage at matched coverage (Fig 5).
Atlas H&E-TME supports tissue quality control, segmentation into seven tissue classes, and classification of nine cell types, yielding >4,500 quantitative spatial and morphological features per slide.
The large-scale breadth cohort spans 1,500+ cases, 8 cancer types with >90% subtype coverage per type, 5 common metastasis sites, over 25 tissue sources and biobanks, and 8+ scanner models, with over 200,000 high-confidence pathologist annotations.
Atlas H&E-TME generalizes consistently and robustly across this wide morphological and technical diversity, demonstrating scale and broad indication applicability.
Bleach-and-restain IHC coregistered to the same physical section provided molecular ground truth at cell level, addressing the morphological ambiguity and inter-rater variability that limit H&E-only ground truth.
The confidence scoring of Atlas H&E-TME is informative for uncertainty: cells the model abstains from are those it is most likely to misclassify.

Threat model

n/a. The paper does not explicitly define or analyze adversarial threat models; it focuses on enabling accurate, scalable tissue profiling for research and clinical pathology use, assuming standard clinical-quality WSIs without malicious manipulation.

Methodology — deep read

Threat Model & Assumptions: The adversary is not explicitly modeled here; the focus is on clinical AI applicability. The main challenge addressed is overcoming morphological ambiguity in H&E alone, and variability among pathologist annotators. The model assumes high-quality scanned WSIs from diverse clinical sources, and uses a molecularly grounded pathologist consensus as the gold standard. Adversarial manipulation or attacks are not considered.
Data: Two main datasets underpin validation. For in-depth validation, 30 FFPE resection WSIs across colorectal carcinoma, non-small cell lung cancer (NSCLC), and urothelial carcinoma (10 per indication) were stained with H&E, then bleached and restained on the same physical section with a 5-plex IHC panel (targeting carcinoma cells, lymphocytes, granulocytes, plasma cells, macrophages). WSIs were scanned and coregistered to micrometer precision. Five board-certified pathologists independently annotated detected cells (using the model’s StarDist nuclei detection) first on H&E alone, then with IHC overlay after a 10-day washout. Consensus labels were assigned by majority vote, excluding ambiguous cells.

For in-breadth validation, a held-out cohort of 1,500+ H&E WSIs was assembled spanning eight solid tumor types plus five metastatic sites, drawing from >25 sources and scanned with 8+ devices, with >200,000 high-confidence pathologist annotations balanced across cell and tissue classes. This cohort covers >90% of clinical morphological subtypes per cancer type and includes varied sample types (biopsies, resections, FNAs).

Architecture/Algorithm: Atlas H&E-TME combines four model stages:

Tissue Quality Control (QC): Multi-scale semantic segmentation using a foundation model backbone (a DINO-self-supervised vision transformer) trained with combined cross-entropy and Dice loss to identify valid tissue, out-of-focus areas, artifacts, pen marks, and background.
Tissue Segmentation: Pixel-wise multilabel classification into seven tissue classes (carcinoma, normal epithelium, stroma, necrosis, blood, vessel, other) using similar multi-scale semantic segmentation built on the same foundation backbone.
Cell Detection and Classification: Cell nuclei detected via a custom StarDist nuclear segmentation model applied within valid tissue regions. Then, detected cells are classified into nine cell types (carcinoma cells, epithelial cells, fibroblasts, lymphocytes, plasma cells, macrophages, granulocytes, endothelial cells, other) by a multi-scale classifier built atop the foundation model. A focal loss mitigates class imbalance.
Tissue & Cell Metrics: Quantitative features computed from tissue and cell labels include cell counts, densities, ratios, nuclear morphology statistics, and spatial neighborhood statistics (co-occurrence, density within 20µm and 40µm radii).

Training Regime: The foundation model uses a vision transformer pretrained on large-scale diverse histopathology data with DINO self-supervised learning, enabling rich transferable features. Downstream task-specific models are supervised, trained on pathologist-informed annotations from diverse data sources. Details on epochs, batch size, hardware, random seeds, and hyperparameters were not fully disclosed.
Evaluation Protocol:

In-depth: Performance was benchmarked against an IHC-informed five-pathologist consensus labeled ground truth on 30 WSIs and multiple ROIs. Both Atlas and each pathologist’s H&E-only predictions were compared on identical detected cell sets, using macro F1, with statistical confidence via 10,000 stratified bootstrap iterations. Abstention experiments assessed performance when uncertain cells could be excluded under matched coverage.
In-breadth: Atlas was validated with the large diverse held-out cohort with high-confidence pathologist annotations to assess consistency and robust generalization over eight tumor types, five metastatic sites, scanner types, and sample types. Multi-class classification metrics and tissue QC/segmentation metrics were reported.

Reproducibility: The OpenTME dataset derived from TCGA WSIs processed by Atlas H&E-TME was released publicly. Atlas H&E-TME itself is a commercial/research platform co-developed by several institutions. Code and model weights were not explicitly stated as publicly released. Detailed annotation protocols and IHC consensus workflow were documented.

Example end-to-end: One colorectal carcinoma resection specimen was first H&E stained and scanned. The slide was bleached and restained with five molecular IHC stains targeting major immune and tumor cell classes, rescanned, and the images coregistered. Five pathologists annotated the same detected nuclei twice (H&E-only, then H&E plus IHC). The IHC-informed consensus served as the ground truth. Atlas H&E-TME's cell classification output on the original H&E was compared to each pathologist’s H&E-only annotations on identical nuclei, demonstrating parity or superiority. Subsequently, Atlas was applied to 1,500+ cases spanning multiple cancer types and scanners to confirm robustness.

Technical innovations

Introduction of a dual validation framework combining IHC-informed multi-pathologist consensus for depth and large-scale multi-cancer, multi-scanner pathologist-annotated cohorts for breadth.
Application of a large pretrained pathology foundation model (DINO-based vision transformer) as a backbone for multi-stage tissue QC, segmentation, and cell classification pipelines.
Development of a sequential bleach-and-restain protocol enabling same-section IHC coregistration to H&E, providing molecularly grounded ground truth for evaluating morphologically ambiguous cell classes.
Integration of uncertainty-aware classification allowing Atlas H&E-TME to abstain from ambiguous cell calls, improving precision while matching pathologist coverage.

Datasets

In-depth IHC-informed cohort — 30 WSIs — multi-institution surgical resections with sequential bleach-and-restain 5-plex IHC panel (private).
In-breadth H&E cohort — 1,500+ cases spanning 8 primary cancer types and 5 metastatic sites, 200,000+ pathologist annotations — drawn from 25+ sources and 8+ scanners (private).
OpenTME dataset — TCGA WSIs processed with Atlas H&E-TME for public research access.

Baselines vs proposed

Pathologist H&E-only mean macro F1: 0.71 vs Atlas H&E-TME macro F1: 0.74 (IHC-informed consensus ground truth, 5 cell classes, 10k bootstrap, Fig 4).
Atlas H&E-TME maintains superior or equal F1 over H&E-only pathologists across all five immune and carcinoma cell types evaluated, including macrophages and plasma cells.
Inter-rater agreement Krippendorff’s α for pathologists on H&E improved from 0.56–0.74 to 0.74–0.85 with IHC guidance (granulocytes, plasma cells, macrophages).

Figures from the paper

Figures are reproduced from the source paper for academic discussion. Original copyright: the paper authors. See arXiv:2606.12346.

Fig 1

Fig 1: Two example WSIs from TCGA with Atlas H&E-TME outputs across the Tissue QC, Tissue

Fig 2

Fig 2: Overview of the in-depth validation workflow. (a) Each FFPE resection section is first H&E-stained

Fig 3

Fig 3 (page 5).

Fig 4

Fig 4 (page 5).

Fig 5

Fig 5 (page 5).

Fig 6

Fig 6 (page 5).

Fig 7

Fig 7 (page 5).

Fig 8

Fig 8 (page 5).

Limitations

In-depth validation IHC-informed consensus cohort is relatively small (30 WSIs, 3 cancer types), limiting statistical power and cancer diversity for molecular grounding.
Training and validation cohorts are compiled from multiple sources but details on hyperparameters, training epochs, and model selection are not fully disclosed, affecting exact reproducibility.
No explicit adversarial robustness evaluation or testing on poor-quality slides beyond reported tissue QC model.
Large-scale in-breadth evaluation uses pathologist H&E-only annotations as ground truth which still may embed inter-rater variability, even if minimized by training and guidelines.
Atlas H&E-TME currently supports eight solid tumor types and common metastatic sites; extension to other indications or rare subtypes is pending.
Public release is limited to the derived OpenTME dataset from TCGA; source code and pretrained weights for Atlas H&E-TME are not fully open-access.

Open questions / follow-ons

How well does Atlas H&E-TME perform on rare tumor subtypes or non-solid tumors not included in the current eight cancer types?
Can the dual validation approach be extended by incorporating multimodal molecular data beyond IHC, such as multiplex immunofluorescence or spatial transcriptomics, to further refine ground truth?
How robust is Atlas H&E-TME to significant distribution shifts, such as lower quality stains, artifacts, or novel scanner types not seen in training?
What are the practical implications of uncertainty-aware classification for deployment in clinical workflows — e.g., can abstentions guide human review efficiently?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners interested in AI robustness and validation, this paper illustrates a rigorous approach to verifying a complex biomedical vision model across both molecular ground truth and large diverse cohorts, highlighting challenges around label ambiguity and inter-rater variability. The dual validation framework sets an example for evaluating models where ground truth is inherently noisy or ambiguous, suggesting that combining high-fidelity localized references with broad-scale annotations is needed to claim robustness. While the domain is histopathology, the principle of molecularly informed consensus and large diverse data for validation can inform CAPTCHA challenge design where attacker behavior or environment induces variability. Furthermore, uncertainty estimation and the ability to abstain from low-confidence predictions may be applied to bot-detection systems to improve precision and triage, paralleling pathology model strategies.

Cite

bibtex

@article{arxiv2606_12346,
  title={ Atlas H&E-TME: Scalable AI-Based Tissue Profiling at Expert Pathologist-Level Accuracy },
  author={ Kai Standvoss and Miriam Hägele and Rosemarie Krupar and Julika Ribbat-Idel and Jennifer Altschüler and Gerrit Erdmann and Hans Pinckaers and Evelyn Ramberger and Madleen Drinkwitz and Ádám Nárai and Alexander Möllers and Katja Lingelbach and Sebastian Kons and Lukas Hönig and Recepcan Adigüzel and Joana Baião and Alberto Megina Gonzalo and Marius Teodorescu and Marie-Lisa Eich and Paolo Chetta and Shakil Merchant and Verena Aumiller and Simon Schallenberg and Andrew Norgan and Klaus-Robert Müller and Lukas Ruff and Maximilian Alber and Frederick Klauschen },
  journal={arXiv preprint arXiv:2606.12346},
  year={ 2026 },
  url={https://arxiv.org/abs/2606.12346}
}

Atlas H&E-TME: Scalable AI-Based Tissue Profiling at Expert Pathologist-Level Accuracy ​

TL;DR ​

Key findings ​

Threat model ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Figures from the paper ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​