Mammo-SAE
, a sparse autoencoder trained on visual features from Mammo-CLIP,
a vision–language model pretrained on mammogram image–report pairs.
Mammo-SAE aims to enhance interpretability in breast imaging by learning latent neurons
that are human-interpretable. It first identifies highly activated latent neurons
and then conducts interventions to understand their causal effects.
Step 2: Sparse Autoencoder Training
The extracted feature xlj is encoded using weight matrix Wenc ∈ ℝd×h, passed through a ReLU nonlinearity, and decoded using Wdec ∈ ℝh×d. The training objective combines reconstruction and sparsity:
This encourages the autoencoder to reconstruct input features while activating only a small number of latent neurons, enabling interpretability.
Step 3: Identifying Concept-Neurons
After training, we compute the class-wise mean latent activation z̄(c) ∈ ℝh over all examples in class c ∈ {0, 1}:
Each latent neuron t is scored by its activation st(c) = z̄t(c), and top-scoring neurons are considered concept-aligned.
Step 4: Visualization and Semantic Probing
We visualize input patches that strongly activate each latent neuron. This reveals whether the neuron focuses on meaningful clinical patterns (e.g., masses, calcifications) or irrelevant areas.
Step 5: Latent Interventions
We intervene on the latent activations z = ReLU(Wenc · xj) by either retaining or suppressing specific neurons:
By comparing the model's outputs before and after intervention, we assess whether the top neurons carry meaningful information or reflect confounding artifacts.
Dataset. We use the VinDr-Mammo dataset which contains approximately 20,000 full-field digital mammograms from 5,000 patients. The dataset includes expert annotations for breast-specific findings such as mass and suspicious calcification.
SAE Training. A single Sparse Autoencoder (SAE) is trained on patch-level features extracted from the fine-tuned Mammo-CLIP model using the Vision-SAEs library. Activations are taken from the final layer of the EfficientNet-B5 backbone trained on the suspicious calcification classification task.
To ensure consistency and reduce computational overhead, a shared SAE is used across all experiments rather than training separate SAEs per model. This design enforces a common latent space and allows direct comparison across settings.
Hyperparameters. The input feature dimension is d = 2048
, with an expansion
factor of 8, resulting in a latent dimension h = 16,384
. The SAE is trained for 200 epochs
with a learning rate of 3 × 10−4
, sparsity penalty
λ = 3 × 10−5
, and a batch size of 4096
.
@InProceedings{Nakka_2025_MICCAI,
author = {Nakka, Krishna Kanth},
title = {Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse Autoencoders},
booktitle = {Proceedings of the Deep Breast Workshop on AI and Imaging for Diagnostic and Treatment Challenges in Breast Care, MICCAI 2025},
month = {September},
year = {2025},
}
This website is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. We are thankful to CDA, BIA for releasing the pretrained models.