Research | Samuele Poppi

Research Compass

I study AI systems at the point where deployment changes them: fine-tuning, adaptation, misuse, new languages, new modalities, and new incentives.

My work is driven by a simple concern: alignment is not a one-time property. A model can look safe before release and become brittle after ordinary updates, downstream customization, or adversarial pressure. I am interested in the mechanisms behind that shift, and in evaluations that make those failures visible before they matter.

I like research that connects control with understanding. Benchmarks are useful when they expose a real deployment failure mode; interpretability is useful when it changes what we can predict or prevent; defenses are useful when they survive contact with adaptation.

What Guides Me

Safety must survive adaptation.Models are rarely frozen after release, so control mechanisms should be evaluated under benign and adversarial change.
Understanding should be actionable.Mechanistic insight matters most when it helps diagnose, predict, or intervene on model behavior.
Evidence beats surface behavior.Final answers, refusal rates, and aggregate scores can hide important failures; useful evaluations inspect traces, representations, and distributions.
Trustworthy AI is a systems problem.Safety, privacy, authenticity, and unlearning interact across training data, model internals, deployment interfaces, and human oversight.

Current Threads

AI Control Across Deployment

Methods and evaluations for keeping language, vision-language, and generative models aligned before release and robust after fine-tuning, adaptation, or policy shifts.

The Mechanics of Model Change

Interpreting how fine-tuning changes models, from multilingual LLM safety to gravitational interpretations of fine-tuning reversion and future mechanistic studies.

Evidence, Forgetting & Forensics

Tools to explain decisions, remove learned information, and assess synthetic media, connecting XAI, unlearning, and deepfake robustness under one reliability lens.

Selected publications Latest news Talks and lectures Start a conversation