Krishna Kanth Nakka

I’m currently working in the Privacy Team at the Trustworthy Technology Lab, Huawei Munich Research Center, where I focus on the privacy and safety of large language models (LLMs). My current research includes studying privacy leakage in LLMs, Unlearning of Sensitive information, Text anonymization, and understanding LLMs through mechanistic interpretability.

I graduated with a PhD in Computer Science in August 2022 from the Computer Vision Lab at EPFL. I was supervised by Dr. Mathieu Salzmann and Prof. Pascal Fua. My thesis focused on the robustness and interpretability of ML models.

Following the completion of my PhD, I worked as a postdoctoral scientist at the Visual Intelligence for Transportation Lab (VITA) at EPFL, under the supervision of Prof. Alexandre Alahi, for eight months, until April 2023.

Before joining EPFL in 2017, I spent two years at Samsung Research Bangalore working on mobile camera algorithms. Prior to that, I graduated from the Department of Electrical Engineering at IIT Kharagpur in 2015 with a dual degree (Master’s and Bachelor’s). During my undergraduate years, I interned at the University of Alberta, the University of Queensland, and Philips Research.

Email / CV / Google Scholar / Github / LinkedIn / Thesis / Thesis Slides

Research

My research interests lie in developing models that are robust and interpretable, particularly for safety- and security-critical applications. Currently, my work focuses on enhancing the privacy of Large Language Models (LLMs). I am particularly interested in understanding the causes of memorization and privacy leakage in LLMs and exploring interpretable methods to mitigate these issues.

During my PhD, I investigated the vulnerabilities of deep neural networks, especially their performance in unexpected or adversarial scenarios, to improve their robustness. My research spanned topics such as explainable models, transfer-based black-box attacks, attack detection, adversarial defenses, anomaly detection, and testing disentangled representations. At VITA, I worked on human pose estimation, tracking, and re-identification, primarily in the context of team sports analytics.

	PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage Krishna Kanth Nakka, Xue Jiang, Xuebing Zhou Under Review 2025 arXiv 2025 📄 Paper TL;DR: We demonstrate that by selecting and steering a few-specific attention heads, personal information about data subjects can be elicited, thereby providing insight into the extent of memorization in LLMs.
	Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse Autoencoders Krishna Kanth Nakka Deep Breast Imaging Workshop, MICCAI 2025 📄 Paper \| 🌐 Webpage \| 💻 Code TL;DR: Mammo-SAE learns clinically relevant breast cancer concepts by aligning sparse latent features with ground truth regions and exposes potential confounding factors in model decisions.
	PrivacyScalpel: Enhancing LLM Privacy via Interpretable Feature Intervention with Sparse Autoencoders Ahmed Frikha, Muhammad Reza,Krishna Kanth Nakka, Ricardo Mendes, XueJiang, Xuebing Zhou arXiv 2025 📄 Paper TL;DR: we introduce PrivacyScalpel, a novel privacypreserving framework that leverages LLM interpretability techniques such as steering and sparse autoencoders, to identify and mitigate PII leakage while maintaining performance.
	NAT: Learning to Attack Neurons for Enhanced Adversarial Transferability Krishna Kanth Nakka, Alexandre Alahi WACV 2025 📄 Paper \| 🌐 Webpage \| 💻 Code TL;DR: We introduce Neuron Attack for Transferability (NAT), a method designed to target specific neuron within the feature embedding. Our approach is motivated by the observation that previous layer-level optimizations often disproportionately focus on a few neurons representing similar concepts, leaving other neurons within the attacked layer minimally affected. our approach NAT shifts the focus from embeddinglevel separation to a more fundamental, neuron-specific approach.
	PII-Scope: A Benchmark for Training Data PII Leakage Assessment in LLMs Krishna Kanth Nakka, Ahmed Frikha, Ricardo Mendis, Xue Jiang, Xuebing Zhou arXiv 2024 Paper We introduce PII-Scope, a comprehensive benchmark designed to evaluate state-of-the-art methodologies for PII extraction attacks targeting LLMs across diverse threat settings. Our study provides a deeper understanding of these attacks by uncovering several hyperparameters (e.g., demonstration selection) crucial to their effectiveness of PII attacks. We show that with sophisticated adversarial capabilities and a limited query budget, PII extraction rates can increase by up to fivefold when targeting the pretrained model
	ObfuscaTune: Obfuscated Offsite Fine-tuning and Inference of Proprietary LLMs on Private Datasets Ahmed Frikha, Nasssim Walha, Ricardo Mendis, Krishna Kanth Nakka, Xue Jiang, Xuebing Zhou arXiv 2024 Paper We propose ObfuscaTune, a novel, efficient and fully utility-preserving approach that combines a simple yet effective obfuscation technique with an efficient usage of confidential computing (only 5% of the model parameters are placed on TEE) to protect LLM model ownership and client data privacy
	IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization Ahmed Frikha, Nasssim Walha, Krishna Kanth Nakka, Ricardo Mendis, Xue Jiang, Xuebing Zhou Safe Generative AI Workshop, NeurIPS 2024 Paper We propose LLM-based anonymization technique, IncogniText, that anonymizes the text to mislead a potential adversary into predicting a wrong private attribute value.
	PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding Krishna Kanth Nakka, Ahmed Frikha, Ricardo Mendis, Xue Jiang, Xuebing Zhou Privacy in NLP Workshop, ACL 2024 Paper We empirically demonstrate that it is possible to improve the extractability of PII by over ten-fold by grounding the prefix of the manually constructed extraction prompt with in-domain data.
	Federated Hyperparameter Optimization Through Reward-Based Strategies: Challenges and Insights Krishna Kanth Nakka, Ahmed Frikha, Ricardo Mendis, Xue Jiang, Xuebing Zhou FedVision Workshop, CVPR 2024 Paper In this paper we take a deeper look at the reward-based strategies and systematically analyze them uncovering several issues and challenges associated with their adoption in practice.Furthermore motivated by the insights from our analysis we propose an in-depth evaluation of policy distribution with metrics that capture rankings of standalone configurations.
	Understanding Pose and Appearance Disentanglement in 3D Human Pose Estimation Krishna Kanth Nakka, Mathieu Salzmann Preprint, 2022 Paper Our analyses show that disentanglement in the three state-of-the-art disentangled representation learning frameworks is far from complete, and that their pose codes contain significant appearance information
	Universal, Transferable Adversarial Attacks for Visual Object Trackers Krishna Kanth Nakka, Mathieu Salzmann Paper Adversarial Robustness Workshop, European Conference on Computer Vision (ECCV), 2022 We propose to learn to generate a single perturbation from the object template only, that can be added to every search image and still successfully fool the tracker for the entire video. As a consequence, the resulting generator outputs perturbations that are quasi-independent of the template, thereby making them universal perturbations.
	Learning Transferable Adversarial Perturbations Krishna Kanth Nakka, Mathieu Salzmann Neural Information and Processing Systems (NeurIPS), 2021 arXiv / code We show that generators trained with mid-level feature separation loss transfers significantly better in cross-model, cross-domain and cross-task setting
	Towards Robust Fine-grained Recognition by Maximal Separation of Discriminative Features Krishna Kanth Nakka, Mathieu Salzmann Asian Conference on Computer Vision (ACCV), 2020 arXiv / code / Slides We improve the robustness by introducing an attention-based regularization mechanism that maximally separates the latent features of discriminative regions of different classes while minimizing the contribution of the non-discriminative regions to the final class prediction.
	Indirect Local Attacks for Context-aware Semantic Segmentation Networks Krishna Kanth Nakka, Mathieu Salzmann European Conference on Computer Vision (ECCV), 2020 [Spotlight] arXiv / code / Slides We show that the resulting networks are sensitive not only to global attacks, where perturbations affect the entire input image, but also to indirect local attacks where perturbations are confined to a small image region that does not overlap with the area that we aim to fool.
	Detecting the Unexpected via Image Resynthesis Krzysztof Lis, Krishna Kanth Nakka, Pascal Fua and Mathieu Salzmann International Conference on Computer Vision (ICCV) , 2019 arXiv / code / Poster We rely on the intuition that the network will produce spurious labels in regions depicting unexpected anomaly objects. Therefore, resynthesizing the image from the resulting semantic map will yield significant appearance differences with respect to the input image which we detect through an auxiliary network
	Interpretable BoW Networks for Adversarial Example Detectio Krishna Kanth Nakka and Mathieu Salzmann Explainable and Interpretable AI workshop, ICCV, 2018 [Oral] arXiv / Slides We build upon the intuition that, while adversarial samples look very similar to real images, to produce incorrect predictions, they should activate codewords with a significantly different visual representation. We therefore cast the adversarial example detection problem as that of comparing the input image with the most highly activated visual codeword.
	Deep Attentional Structured Representation Learning for Visual Recognition Krishna Kanth Nakka and Mathieu Salzmann British Media Vision Conference (BMVC), 2018 arXiv / Poster we introduce an attentional structured representation learning framework that incorporates an image-specific attention mechanism within the aggregation process.
	Deep learning based fence segmentation and removal from an image using a video sequence SankarGanesh Jonna, Krishna Kanth Nakka and Rajiv Ranjan Sahay International Workshop on Video Segmentation, ECCV, 2016 [Oral] arXiv / Slides We use knowledge of spatial locations of fences to subsequently estimate occlusion-aware optical flow. We then fuse the occluded information from neighbouring frames by solving inverse problem of denoising
	Detection and removal of fence occlusions in an image using a video of the static/dynamic scene SankarGanesh Jonna, Krishna Kanth Nakka and Rajiv Ranjan Sahay Journal of the Optical Society of America A (JOSA A) , 2016 arXiv / PDF Our approach of defencing is as follows: (i) detection of spatial locations of fences/occlusions in the frames of the video, (ii) estimation of relative motion between the observations, and (iii) data fusion to fill in occluded pixels in the reference image. We assume the de-fenced image as a Markov random field and obtain its maximum a posteriori estimate by solving the corresponding inverse problem.
	My camera can see through fences: A deep learning approach for image de-fencing SankarGanesh Jonna, Krishna Kanth Nakka and Rajiv Ranjan Sahay Asian Conference on Pattern Recognition (ACPR), , 2015 arXiv / PDF / Poster We propose a semi-automated de-fencing algorithm using a video of the dynamic scene. The inverse problem offence removal is solved using split Bregman technique assuming total variation of the de-fenced image as the regularization constraint.
	3D-to-2D mapping for user interactive segmentation of human leg muscles from MRI data Nilanjan Ray, Satarupa Mukherjee, Krishna Kanth Nakka, Scott T. Acton, Silvia S. Blanker Signal and Information Processing, GlobalSIP, 2014 arXiv / PDF We proposing a framework for user interactive segmentation of MRI of human leg muscles built upon the the strategy of bootstrapping with minimal supervision.
	Non-uniform sampling in EPR: optimizing data acquisition for Hyscore spectroscopy Krishna Kanth Nakka Y. A. Tesiram, I. M. Brereton, M. Mobli and J. R. Harmer Physical Chemistry Chemical Physics (PCCP), 2014 Paper / PDF / Supp We show through non-linear sampling scheme with maximum entropy reconstruction technique in HYSCORE, the experimental times can be shortened by approximately an order of magnitude as compared to conventional linear sampling with negligible loss of information

Scholarships

I'm deeply grateful for the generous scholarships I received throughout my academic journey. Some of these scholarships include:

MITACS Summer Research Scholarship to conduct research at the University of Alberta
University of Queensland Summer Research Scholarship to support the internship at Center for Advanced Imaging Institute
EDIC PhD Fellowship for pursuing the first year of doctoral studies at EPFL

Reviewer

I have peer-reviewed over 100 articles, including:

Reviewer for Transactions on Pattern Analysis and Machine Intelligence, 2019, 2023, 2024
Reviewer for Neural Information Processing Systems (NeurIPS), 2021-2025
Reviewer for Computer Vision and Pattern Recognition (CVPR), 2023-2025
Reviewer for International Conference on Computer Vision (ICCV), 2023, 2025
Reviewer for European Conference on Computer Vision (ECCV), 2024
Reviewer for International Conference on Machine Learning (ICML), 2023, 2024
Reviewer for Asian Conference on Computer Vision (ACCV), 2024
Reviewer for British Machine Vision Conference (BMVC), 2023-2025
Reviewer for Winter Conference on Applications of Computer Vision (WACV), 2019, 2024, 2025
Reviewer for Asian Conference on Machine Learning (ACML), 2024, 2025
Reviewer for International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Reviewer for ECAI, 2025
Reviewer for ACMMM, 2025
Reviewer for CoLLAs, 2025
Reviewer for LREC-COLING, 2024
Reviewer for COLING, 2025
PC Member for MUGen workshop, ICML 2025
PC Member for AIW workshop, ICML 2025
PC Member for T4V workshop, CVPR 2025
PC Member for New Frontiers in Machine Learning workshop, ICML 2023
PC Member for SafeGenAin workshop, NeurIPS 2024
PC Member for MINT workshop, NeurIPS 2024
PC Member for FedKDD 2024 and 2025 workshops, KDD
PC Member for AutoRL workshop, ICML 2024
PC Member for AI4CC workshop, CVPR 2024
PC Member for PML workshop, ICLR 2024

Credits: Webpage template from Jon Barron.

Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse Autoencoders