MammoSAE

🔥 Highlights

Mammo-SAE Introduction. We propose Mammo-SAE, a sparse autoencoder trained on visual features from Mammo-CLIP, a vision–language model pretrained on mammogram image–report pairs. Mammo-SAE aims to enhance interpretability in breast imaging by learning latent neurons that are human-interpretable. It first identifies highly activated latent neurons and then conducts interventions to understand their causal effects.

Mammo-SAE Framework. Our framework projects patch-level CLIP features into a high-dimensional sparse latent space designed for human interpretability. We identify monosemantic latent neurons whose activations align with meaningful breast cancer features such as masses and calcifications by finding the most highly activated neurons. We then perform targeted interventions to test how these neurons affect label predictions by selectively activating or deactivating groups of neurons.

Extensive Evaluation. We visualize spatial activations of top-activated latent neurons, showing that they frequently match clinically relevant regions of interest. Our experiments also reveal confounding factors— such as background patterns—that influence model decisions. Furthermore, fine-tuning Mammo-CLIP increases the activation of clinically relevant latent neurons, thus explaining the reasons for performance gains.

🧠 Mammo-SAE Method Overview

Step 1: Feature Extraction Given an input image I, we extract local features from a pretrained Mammo-CLIP model at a specific layer l. Each spatial position j in the feature map yields a vector x_l^j ∈ ℝ^d, where d is the feature dimension and N_l = H_l × W_l is the number of spatial locations.

Step 2: Sparse Autoencoder Training

The extracted feature x_l^j is encoded using weight matrix W_enc ∈ ℝ^d×h, passed through a ReLU nonlinearity, and decoded using W_dec ∈ ℝ^h×d. The training objective combines reconstruction and sparsity:

Equation 1

This encourages the autoencoder to reconstruct input features while activating only a small number of latent neurons, enabling interpretability.

Step 3: Identifying Concept-Neurons

After training, we compute the class-wise mean latent activation z̄^(c) ∈ ℝ^h over all examples in class c ∈ {0, 1}:

Equation 2

Each latent neuron t is scored by its activation s_t^(c) = z̄_t^(c), and top-scoring neurons are considered concept-aligned.

Step 4: Visualization and Semantic Probing

We visualize input patches that strongly activate each latent neuron. This reveals whether the neuron focuses on meaningful clinical patterns (e.g., masses, calcifications) or irrelevant areas.

Step 5: Latent Interventions

We intervene on the latent activations z = ReLU(W_enc · x^j) by either retaining or suppressing specific neurons:

Top-k Activated: Keep only the top-k neuron activations:

Top-k Deactivated: Suppress the top-k neurons:

By comparing the model's outputs before and after intervention, we assess whether the top neurons carry meaningful information or reflect confounding artifacts.

📊 Experiments

Dataset. We use the VinDr-Mammo dataset which contains approximately 20,000 full-field digital mammograms from 5,000 patients. The dataset includes expert annotations for breast-specific findings such as mass and suspicious calcification.

SAE Training. A single Sparse Autoencoder (SAE) is trained on patch-level features extracted from the fine-tuned Mammo-CLIP model using the Vision-SAEs library. Activations are taken from the final layer of the EfficientNet-B5 backbone trained on the suspicious calcification classification task.

To ensure consistency and reduce computational overhead, a shared SAE is used across all experiments rather than training separate SAEs per model. This design enforces a common latent space and allows direct comparison across settings.

Hyperparameters. The input feature dimension is d = 2048, with an expansion factor of 8, resulting in a latent dimension h = 16,384. The SAE is trained for 200 epochs with a learning rate of 3 × 10⁻⁴, sparsity penalty λ = 3 × 10⁻⁵, and a batch size of 4096.

Citation

@InProceedings{Nakka_2025_MICCAI, author = {Nakka, Krishna Kanth}, title = {Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse Autoencoders}, booktitle = {Proceedings of the Deep Breast Workshop on AI and Imaging for Diagnostic and Treatment Challenges in Breast Care, MICCAI 2025}, month = {September}, year = {2025}, }

Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse AutoEncoders
[DeepBreath, MICCAI 2025]

🔥 Highlights

🧠 Mammo-SAE Method Overview

📊 Experiments

Group Interventions on Latent Neurons

Spatial Alignment Between SAE Neurons and Breast Concept Regions

Visualization of Class-Level Latent Neurons

Visualization of class-level latent neurons for Finetuned (Suspicious Calcification) model

Visualization of class-level latent neurons for Finetuned (Mass) model

Visualization of class-level latent neurons for Pretrained (Suspicious Calcification) model

Visualization of class-level latent neurons for Pretrained (Mass) model

Latent Neuron Separation: Fine-Tuned vs. Pretrained

Citation

Acknowledgement

Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse AutoEncoders [DeepBreath, MICCAI 2025]

🔥 Highlights

🧠 Mammo-SAE Method Overview

📊 Experiments

Group Interventions on Latent Neurons

Spatial Alignment Between SAE Neurons and Breast Concept Regions

Visualization of Class-Level Latent Neurons

Visualization of class-level latent neurons for Finetuned (Suspicious Calcification) model

Visualization of class-level latent neurons for Finetuned (Mass) model

Visualization of class-level latent neurons for Pretrained (Suspicious Calcification) model

Visualization of class-level latent neurons for Pretrained (Mass) model

Latent Neuron Separation: Fine-Tuned vs. Pretrained

Citation

Acknowledgement

Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse AutoEncoders
[DeepBreath, MICCAI 2025]