Mammo-SAE
, a sparse autoencoder trained on visual features from Mammo-CLIP—a vision-language model pretrained on mammogram image–report pairs. Mammo-SAE aims to enhance interpretability in breast imaging by learning latent neurons that correspond to clinical concepts. This approach provides neuron-level insight beyond conventional post-hoc explanations.
Step 1: Feature Extraction
Given an input image I, we extract local features from a pretrained Mammo-CLIP model at a specific layer l. Each spatial position j in the feature map yields a vector xlj ∈ ℝd, where d is the feature dimension and Nl = Hl × Wl is the number of spatial locations.
Step 2: Sparse Autoencoder Training
The extracted feature xlj is encoded using weight matrix Wenc ∈ ℝd×h, passed through a ReLU nonlinearity, and decoded using Wdec ∈ ℝh×d. The training objective combines reconstruction and sparsity:
L = ‖Wdec · ReLU(Wenc · xj) – xj‖22 + λ‖ReLU(Wenc · xj)‖1
This encourages the autoencoder to reconstruct input features while activating only a small number of latent neurons, enabling interpretability.
Step 3: Identifying Concept-Neurons
After training, we compute the class-wise mean latent activation z̄(c) ∈ ℝh over all examples in class c ∈ {0, 1}:
z̄(c) = (1 / |Dc| · Nl) ∑x ∈ Dc ∑j=1Nl ReLU(Wenc · xj)
Each latent neuron t is scored by its activation st(c) = z̄t(c), and top-scoring neurons are considered concept-aligned.
Step 4: Visualization and Semantic Probing
We visualize input patches that strongly activate each latent neuron. This reveals whether the neuron focuses on meaningful clinical patterns (e.g., masses, calcifications) or irrelevant areas.
Step 5: Latent Interventions
We intervene on the latent activations z = ReLU(Wenc · xj) by either retaining or suppressing specific neurons:
z′ = z ⊙ m, where mi = 1 if i ∈ Tk(0) ∪ Tk(1), else 0
z′ = z ⊙ (1 – m)
By comparing the model's outputs before and after intervention, we assess whether the top neurons carry meaningful information or reflect confounding artifacts.
Dataset. We use the VinDr-Mammo dataset [Nguyen et al., 2023] , which contains approximately 20,000 full-field digital mammograms from 5,000 patients. The dataset includes expert annotations for breast-specific findings such as mass and suspicious calcification.
SAE Training. We train a single Sparse Autoencoder (SAE) on patch-level features extracted from the fine-tuned Mammo-CLIP model using the Vision-SAEs library. Specifically, we extract activations from the final layer of the EfficientNet-B5 backbone trained on the suspicious calcification classification task.
To ensure consistency and reduce computational overhead, we use a shared SAE across all experiments rather than training separate SAEs per model. This design ensures a common latent space and facilitates direct comparison across settings.
Hyperparameters. Input feature dimension is set to d = 2048, and we use an expansion factor of 8, resulting in a latent dimension h = 16,384. The SAE is trained for 200 epochs with a learning rate of 3 × 10−4, sparsity penalty λ = 3 × 10−5, and a batch size of 4096.
@InProceedings{Nakka_2025_MICCAI,
author = {Nakka, Krishna Kanth},
title = {Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse Autoencoders},
booktitle = {Proceedings of the Deep Breast Workshop on AI and Imaging for Diagnostic and Treatment Challenges in Breast Care, MICCAI 2025},
month = {September},
year = {2025},
}
This website is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. We are thankful to CDA, BIA for releasing the pretrained models.