Postdoctoral Associate · AI Safety

Samuele Poppi

I study how AI systems change after deployment, adaptation, and misuse.

My work connects safety control, mechanistic understanding, and trustworthy evidence for language, vision-language, and generative models after they leave the lab.

AI Control Model Change Forensics

AI Safety Portfolio CV Scholar GitHub Email

AI Control Across Deployment

Control mechanisms and evaluations that survive fine-tuning, adaptation, and downstream customization.

The Mechanics of Model Change

Mechanistic and geometric views of how models shift under multilingual attacks and post-alignment updates.

Evidence, Forgetting & Forensics

XAI, unlearning, watermarking, and multimedia forensics as tools for evidence under uncertainty.

Featured Work

Recent work at the intersection of AI safety, robustness, and trustworthy generative models.

arXiv 2026 Jun 26, 2026

A Gravitational Interpretation of Fine-Tuning Reversion

A geometric account of why post-alignment fine-tuning can pull model behavior back toward earlier training-history manifolds.

Read paper

ECCV 2026 Jun 1, 2026

SPQR: A Multi-Dimensional Benchmark for Safety Alignment under Benign Model Adaptation

A benchmark for testing whether text-to-image safety alignment still holds after benign fine-tuning and deployment-time adaptation.

Read paper

NAACL 2025 Oct 1, 2024

Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks

Fine-tuning attacks in one language can break multilingual safety alignment, suggesting safety information is partly language-agnostic.

Read paper

ECCV 2024 Nov 27, 2023

Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models

A method for reducing unsafe concept associations in CLIP while preserving useful vision-language behavior.

Read paper

Papers & Preprints

Author in bold is me; underlined authors are students I advised or co-advised on that specific project.

7 papers

Postdoc

Current work from MBZUAI and post-PhD collaborations.

Samuele Poppi, Nils Lukas . A Gravitational Interpretation of Fine-Tuning Reversion. In arXiv 2026.

PDF Cite

Sai Kartheek Reddy Kasu, Nils Lukas, Samuele Poppi . When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models. In arXiv 2026.

PDF Cite

Mohammed Talha Alam, Nada Saadi, Fahad Shamshad, Nils Lukas, Karthik Nandakumar, Fakhri Karray, Samuele Poppi . SPQR: A Multi-Dimensional Benchmark for Safety Alignment under Benign Model Adaptation. In ECCV 2026.

PDF Cite

Toluwani Aremu, Daniil Ognev, Samuele Poppi, Nils Lukas . Robust Safety Monitoring of Language Models via Activation Watermarking. In arXiv 2026.

PDF Cite

Silvia Cappelletti, Tobia Poppi, Samuele Poppi, Zheng-Xin Yong, Diego Garcia-Olano, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara . Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Output Prefilling. In ICPR 2026.

PDF Cite

Sarim Hashmi, Abdelrahman Elsayed, Mohammed Talha Alam, Samuele Poppi, Nils Lukas . Robust and Calibrated Detection of Authentic Multimedia Content. In arXiv 2025.

PDF Cite

Toluwani Aremu, Noor Hussein, Munachiso Nwadike, Samuele Poppi, Jie Zhang, Karthik Nandakumar, Neil Gong, Nils Lukas . Mitigating Watermark Forgery in Generative Models via Randomized Key Selection. In arXiv 2025.

PDF Cite

6 papers

PhD

Work from my PhD years on safe, explainable, and trustworthy AI.

Samuele Poppi, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara . Unlearning Vision Transformers without Retaining Data via Low-Rank Decompositions. In ICPR 2024.

PDF Cite

Samuele Poppi, Zheng-Xin Yong, Yifei He, Bobbie Chern, Han Zhao, Aobo Yang, Jianfeng Chi . Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks. In NAACL 2025.

PDF Cite

Samuele Poppi, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara . Multiclass Unlearning for Image Classification via Weight Filtering. In IEEE Intelligent Systems 2024.

PDF Cite Publisher

Samuele Poppi, Tobia Poppi, Federico Cocchi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara . Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models. In ECCV 2024.

PDF Cite Code Dataset Project

Samuele Poppi, Roberto Bigazzi, Niyati Rawal, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara . Towards Explainable Navigation and Recounting. In ICIAP 2023.

PDF Cite DOI Publisher

Federico Cocchi, Lorenzo Baraldi, Samuele Poppi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara . Unveiling the Impact of Image Transformations on Deepfake Detection: An Experimental Analysis. In ICIAP 2023.

PDF Cite

1 paper

MSc

Early work from my master’s research.

Samuele Poppi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara . Revisiting The Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis. In CVPRW 2021.

PDF Cite

Talks & Lectures

Talk Feb 9, 2026

Not All Attackers Are Malicious: When Safety Degrades Without Harmful Intent

A talk on how safety can silently degrade when models are adapted after deployment, even without malicious intent.

View talk

Talk Nov 20, 2025

Who Defines Safety for T2I Diffusion Models?

A lecture on who defines safety for text-to-image models, and why human authority matters in generative AI.

View talk

Talk May 27, 2025

From Text to Vision: Ensuring Responsibility and Safety in Modern AI

A lecture on attacks, defenses, and responsible safety mechanisms for language and multimodal AI.

View talk

Contact

I am always happy to discuss AI safety, robust evaluation, trustworthy multimodal systems, and collaborations around model behavior under adaptation.

Email me Google Scholar GitHub LinkedIn