NAT

NAT: Learning to Attack Neurons for Enhanced Adversarial Transferability
[WACV 2025]

VITA Lab, EPFL, Switzerland

On the left, we illustrate how prior single generator-based methods, such as LTP and BIA, attack the entire embedding but predominantly disrupt neurons related to a single concept, like a circular text pattern, while leaving most other neurons largely unaffected. In contrast, our framework on the right trains multiple generators to target individual neurons, each representing distinct concepts. By focusing on attacking neurons that represent low-level concepts, our method not only generates highly transferable perturbations but also produces diverse, complementary attack patterns. Best viewed in color and zoomed.

🔥Highlights

  1. NAT Introduction. The generation of transferable adversarial perturbations typically involves training a generator to maximize embedding separation between clean and adversarial images at a single mid-layer of a source model. In this work, we build on this approach and introduce Neuron Attack for Transferability (NAT), a method designed to target specific neuron within the embedding. Our approach is motivated by the observation that previous layer-level optimizations often disproportionately focus on a few neurons representing similar concepts, leaving other neurons within the attacked layer minimally affected. NAT shifts the focus from embeddinglevel separation to a more fundamental, neuron-specific approach. We find that targeting individual neurons effectively disrupts the core units of the neural network, providing a common basis for transferability across different models.

  2. NAT Framework. NAT train a UNet-based perturbation generator that takes the input images and generate adversarial images in a single forward loss at attacking time. For training generator, NAT relies on L2 separation between the clean and adversarial of a specific neuron (ie., channels) in the feature map.

  3. Extensive Evaluation. We conduct a rigorous evaluation on 41 ImageNet-pretrained models, covering 16 traditional Convolutional networks, 8 Transformers architectures, 17 Hybrid architectures. We also evaluate the transferability on nine fine-grained dataset models. We demonstrate that a single neuron-specific adversarial generator achieves over 14% improvement in transferability for cross-modal settings and 4% improvement in cross-domain settings. Additionally, by leveraging the complementary attack capabilities of NAT’s generators, we show that adversarial transferability can be significantly enhanced with fewer than 10 queries to the target model.

Citation


  @InProceedings{Nakka_2025_WACV,
    author    = {Nakka, Krishna Kanth and Alahi, Alexandre},
    title     = {NAT: Learning to Attack Neurons for Enhanced Adversarial Transferability},
    booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
    month     = {February},
    year      = {2025},
    pages     = {7582-7593}
}
  

Acknowledgement

This website is adapted from Nerfies, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. We are thankful to CDA, BIA for releasing the pretrained models.

VITA Logo