From Text to Vision: Ensuring Responsibility and Safety in Modern AI

May 27, 2025·
Samuele Poppi
Samuele Poppi
· 1 min read
Abstract
As generative AI systems grow in complexity and adoption, ensuring their safety and alignment becomes both more critical and more challenging. This talk explores the fragility of modern AI models—from text-only large language models (LLMs) to multimodal architectures—when exposed to adversarial manipulation. We begin by examining redteaming techniques that reveal how easily current LLMs can be jailbroken through methods such as prompt injection, character roleplay, and fine-tuning attacks. We then present recent findings showing that multilingual safety alignment can be compromised even by monolingual fine-tuning, highlighting the presence of language-agnostic safety parameters. To investigate this phenomenon, we introduce Safety Information Localization (SIL), a method to identify and manipulate the minimal set of parameters encoding safety-critical behavior. Finally, we transition to multimodal models and demonstrate Safe-CLIP, a framework that directly edits the CLIP embedding space to suppress harmful visual-textual associations. Together, these insights suggest that model safety must be embedded at a fundamental level—across languages, modalities, and representations—to ensure robust and responsible AI systems.
Date
May 27, 2025 2:00 PM — 4:00 PM
Event
Responsible Generative AI - National PhD Program in AI For Society
Location

Scuola Normale Superiore di Pisa

7 Piazza dei Cavalieri, Pisa, Tuscany 56126

Click on the Slides button above to view the built-in slides feature.

Slides can be added in a few ways:

Samuele Poppi
Authors
Postdoctoral Associate - AI Safety