MammoSAE

MammoSAE: Interpreting Breast Cancer Concept Learning with Sparse AutoEncoders
[DeepBreath, MICCAI 2025]

Bavaria, Germany
Mammo-SAE Framework

Mammo-SAE Framework. The SAE is first trained on patch-level CLIP features xj ∈ ℝd at any given layer, projecting them into a high-dimensional, interpretable sparse latent space z ∈ ℝh, and decoding them back for reconstruction. Once trained, the SAE is used to analyze which latent neurons are activated and what semantic information they encode. We also perform targeted interventions in the latent neuron space to assess their influence on downstream label prediction. We observe the learned latents capture diverse regions such as nipple regions, mass regions, and background areas. Red boxes indicate ground-truth mass localization.

🔥 Highlights

  1. Mammo-SAE Introduction. We propose Mammo-SAE, a sparse autoencoder trained on visual features from Mammo-CLIP—a vision-language model pretrained on mammogram image–report pairs. Mammo-SAE aims to enhance interpretability in breast imaging by learning latent neurons that correspond to clinical concepts. This approach provides neuron-level insight beyond conventional post-hoc explanations.

  2. Mammo-SAE Framework. Our framework projects patch-level CLIP features into a high-dimensional sparse latent space, enabling reconstruction and interpretability. We identify monosemantic latent neurons whose activations align with meaningful breast cancer features such as masses and calcifications. We also perform targeted interventions to test how these neurons affect label prediction.

  3. Extensive Evaluation. We visualize spatial activations of latent neurons, showing that they frequently match clinical regions of interest. Our experiments reveal the presence of confounding factors influencing model decisions. Furthermore, finetuning Mammo-CLIP leads to more distinct latent neuron clusters and improved interpretability and performance.

🧠 Mammo-SAE Method Overview

Step 1: Feature Extraction

Given an input image I, we extract local features from a pretrained Mammo-CLIP model at a specific layer l. Each spatial position j in the feature map yields a vector xlj ∈ ℝd, where d is the feature dimension and Nl = Hl × Wl is the number of spatial locations.

Step 2: Sparse Autoencoder Training

The extracted feature xlj is encoded using weight matrix Wenc ∈ ℝd×h, passed through a ReLU nonlinearity, and decoded using Wdec ∈ ℝh×d. The training objective combines reconstruction and sparsity:

L = ‖Wdec · ReLU(Wenc · xj) – xj22 + λ‖ReLU(Wenc · xj)‖1

This encourages the autoencoder to reconstruct input features while activating only a small number of latent neurons, enabling interpretability.

Step 3: Identifying Concept-Neurons

After training, we compute the class-wise mean latent activation (c) ∈ ℝh over all examples in class c ∈ {0, 1}:

(c) = (1 / |Dc| · Nl) ∑x ∈ Dcj=1Nl ReLU(Wenc · xj)

Each latent neuron t is scored by its activation st(c) = z̄t(c), and top-scoring neurons are considered concept-aligned.

Step 4: Visualization and Semantic Probing

We visualize input patches that strongly activate each latent neuron. This reveals whether the neuron focuses on meaningful clinical patterns (e.g., masses, calcifications) or irrelevant areas.

Step 5: Latent Interventions

We intervene on the latent activations z = ReLU(Wenc · xj) by either retaining or suppressing specific neurons:

  • Top-k Activated: Keep only the top-k neuron activations:

    z′ = z ⊙ m,   where   mi = 1 if i ∈ Tk(0) ∪ Tk(1), else 0


  • Top-k Deactivated: Suppress the top-k neurons:

    z′ = z ⊙ (1 – m)

By comparing the model's outputs before and after intervention, we assess whether the top neurons carry meaningful information or reflect confounding artifacts.

🧪 Experimental Setup

Dataset. We use the VinDr-Mammo dataset  [Nguyen et al., 2023] , which contains approximately 20,000 full-field digital mammograms from 5,000 patients. The dataset includes expert annotations for breast-specific findings such as mass and suspicious calcification.

SAE Training. We train a single Sparse Autoencoder (SAE) on patch-level features extracted from the fine-tuned Mammo-CLIP model using the Vision-SAEs library. Specifically, we extract activations from the final layer of the EfficientNet-B5 backbone trained on the suspicious calcification classification task.

To ensure consistency and reduce computational overhead, we use a shared SAE across all experiments rather than training separate SAEs per model. This design ensures a common latent space and facilitates direct comparison across settings.

Hyperparameters. Input feature dimension is set to d = 2048, and we use an expansion factor of 8, resulting in a latent dimension h = 16,384. The SAE is trained for 200 epochs with a learning rate of 3 × 10−4, sparsity penalty λ = 3 × 10−5, and a batch size of 4096.

Suspicious Calcification
(a) Topk-k class-level latent neurons Activated
Benign Mass
(a) Topk-k class-level latent neurons Deactivated
Figure: Group interventions on class-level latent neurons

Visualization of class-level latent neurons for Finetuned (Suspicious Calcification) model

Latent visualization of Suspicious Calcification

Visualization of class-level latent neurons for Finetuned (Mass) model

Latent visualization of Suspicious Calcification

Visualization of class-level latent neurons for Pretrained (Suspicious Calcification) model

Latent visualization of Suspicious Calcification

Visualization of class-level latent neurons for Pretrained (Mass) model

Latent visualization of Suspicious Calcification

Citation


  @InProceedings{Nakka_2025_MICCAI,
    author    = {Nakka, Krishna Kanth},
    title     = {Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse Autoencoders},
    booktitle = {Proceedings of the  Deep Breast Workshop on AI and Imaging for Diagnostic and Treatment Challenges in Breast Care, MICCAI 2025},
    month     = {September},
    year      = {2025},
}
  

Acknowledgement

This website is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. We are thankful to CDA, BIA for releasing the pretrained models.

VITA Logo