Our NAT framework focuses on training multiple perturbation generators, each targeting a specific neuron within a chosen layer of the source model. During training, each generator learns to maximize the $L_2$ separation between the clean and adversarial images at the targeted neuron level. This neuron-specific approach allows NAT to effectively disrupt distinct concepts represented by individual neurons, leading to the generation of diverse and complementary adversarial patterns. At inference time, these generators can be employed independently or in combination to produce adversarial examples that exhibit high transferability across various target models.
Figure 2: Overview of the NAT framework for training neuron-specific perturbation generators.
Evaluation in Cross-Model Setting: We present a comprehensive quantitative evaluation of our NAT method against state-of-the-art baselines, LTP and BIA, across a diverse set of 41 ImageNet-pretrained models. Specifically, NAT achieves over 14% improvement in transferability for cross-modal settings and 4% improvement in cross-domain settings.
Evaluation in Cross-Domain Setting: We report the adversarial accuracy (in %) across three fine-grained datasets. We observe that our generators substantially outperform the baseline models in deceiving the target networks in single query $k = 1$ and multi-query $k = 10$ and $k = 40$ settings.
Figure 3: Transferability Heatmap. Cross-architecture evaluation showing adversarial transferability from our neuron-specific generators (y-axis) across 41 target models (x-axis).
We present qualitative comparisons of adversarial examples generated by our NAT method against those produced by state-of-the-art baselines, LTP and BIA. The results highlight the superior transferability of perturbations generated by NAT across various target models, including ConvNeXt, DeiT, and BEiT. Notably, NAT generates perturbations that are more visually diverse and effective in misleading different architectures, demonstrating its robustness and versatility in adversarial attacks.
@InProceedings{Nakka_2025_WACV,
author = {Nakka, Krishna Kanth and Alahi, Alexandre},
title = {NAT: Learning to Attack Neurons for Enhanced Adversarial Transferability},
booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
month = {February},
year = {2025},
pages = {7582-7593}
}
This website is adapted from Nerfies, licensed under a CC BY-SA 4.0 License. We thank CDA and BIA for releasing their pretrained models.