Not All Attackers Are Malicious: When Safety Degrades Without Harmful Intent

Feb 9, 2026·
Samuele Poppi
Samuele Poppi
· 1 min read
Abstract
AI safety is often framed through a classical security model: a malicious attacker tries to exploit a static system, while a defender tries to prevent abuse. This talk challenges that framing by focusing on unintentional failures that emerge during normal model evolution. Using text-to-image diffusion models as a case study, I discuss how benign fine-tuning, personalization, and domain adaptation can erode safety alignment after deployment. The talk introduces the SPQR perspective on safety, prompt adherence, quality, and robustness, arguing that safety evaluation should track continuously evolving systems rather than only frozen checkpoints.
Date
Feb 9, 2026
Event
MBZUAI Symposium on Security in the Age of AI
Location

Mohamed bin Zayed University of Artificial Intelligence

Masdar City, Abu Dhabi,

Slides from the talk are available below.