Name: Not All Attackers Are Malicious: When Safety Degrades Without Harmful Intent
Start: 2026-02-09T00:00:00Z
Location: Mohamed bin Zayed University of Artificial Intelligence

Not All Attackers Are Malicious: When Safety Degrades Without Harmful Intent

Feb 9, 2026·

Samuele Poppi

· 1 min read

Slides

Abstract

AI safety is often framed through a classical security model: a malicious attacker tries to exploit a static system, while a defender tries to prevent abuse. This talk challenges that framing by focusing on unintentional failures that emerge during normal model evolution. Using text-to-image diffusion models as a case study, I discuss how benign fine-tuning, personalization, and domain adaptation can erode safety alignment after deployment. The talk introduces the SPQR perspective on safety, prompt adherence, quality, and robustness, arguing that safety evaluation should track continuously evolving systems rather than only frozen checkpoints.

Date

Feb 9, 2026

Event

MBZUAI Symposium on Security in the Age of AI

Location

Mohamed bin Zayed University of Artificial Intelligence

Masdar City, Abu Dhabi,

Slides from the talk are available below.

Last updated on Feb 9, 2026