Who Defines Safety for T2I Diffusion Models?

Nov 20, 2025·
Samuele Poppi
Samuele Poppi
· 0 min read
Abstract
Ensuring safety in text-to-image (T2I) diffusion models remains a fundamental yet unresolved challenge. Existing approaches—ranging from post-generation filtering to guidance-based mechanisms and constitution-in-the-weights fine-tuning—rely on fixed concept lists, dataset-driven proxies, or model-internal heuristics, each introducing brittleness, bias, or utility degradation. Moreover, current systems implicitly delegate safety definitions to developers or training data, leaving unclear who ultimately determines acceptable model behavior. This work surveys the conceptual and technical foundations of safety for T2I models, analyzing limitations of prevailing methods and drawing connections to recent alignment strategies such as Constitutional AI. I argue that robust safety mechanisms require explicit, human-supervised normative frameworks and discuss open questions regarding how such constitutions should be constructed, compared, and integrated into generative pipelines. My goal is to outline a principled research agenda for developing controllable T2I systems in which humans retain meaningful authority over safety decisions.
Date
Nov 20, 2025 2:00 PM — 4:00 PM
Event
Machine Learning Security - PhD Program in Artificial Intelligence
Location

Mohamed bin Zayed University of Artificial Intelligence

Masdar City, Abu Dhabi,