Postdoctoral Associate · AI Safety

Samuele Poppi

I study how AI systems change after deployment, adaptation, and misuse.

My work connects safety control, mechanistic understanding, and trustworthy evidence for language, vision-language, and generative models after they leave the lab.

Featured Work

Recent work at the intersection of AI safety, robustness, and trustworthy generative models.

Papers & Preprints

Author in bold is me; underlined authors are students I advised or co-advised on that specific project.
7 papers

Postdoc

Current work from MBZUAI and post-PhD collaborations.
. A Gravitational Interpretation of Fine-Tuning Reversion. In arXiv 2026.
. When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models. In arXiv 2026.
. SPQR: A Multi-Dimensional Benchmark for Safety Alignment under Benign Model Adaptation. In ECCV 2026.
. Robust Safety Monitoring of Language Models via Activation Watermarking. In arXiv 2026.
. Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Output Prefilling. In ICPR 2026.
. Robust and Calibrated Detection of Authentic Multimedia Content. In arXiv 2025.
. Mitigating Watermark Forgery in Generative Models via Randomized Key Selection. In arXiv 2025.
6 papers

PhD

Work from my PhD years on safe, explainable, and trustworthy AI.
. Unlearning Vision Transformers without Retaining Data via Low-Rank Decompositions. In ICPR 2024.
. Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks. In NAACL 2025.
. Multiclass Unlearning for Image Classification via Weight Filtering. In IEEE Intelligent Systems 2024.
. Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models. In ECCV 2024.
. Towards Explainable Navigation and Recounting. In ICIAP 2023.
. Unveiling the Impact of Image Transformations on Deepfake Detection: An Experimental Analysis. In ICIAP 2023.
1 paper

MSc

Early work from my master’s research.
Talks & Lectures
Contact

I am always happy to discuss AI safety, robust evaluation, trustworthy multimodal systems, and collaborations around model behavior under adaptation.