
Postdoctoral Associate · AI Safety
I study how AI systems change after deployment, adaptation, and misuse.
My work connects safety control, mechanistic understanding, and trustworthy evidence for language, vision-language, and generative models after they leave the lab.
Three connected views of the same question: how do we keep AI systems reliable when deployment changes the model, the context, and the evidence we can trust?
Control mechanisms and evaluations that survive fine-tuning, adaptation, and downstream customization.
02Mechanistic and geometric views of how models shift under multilingual attacks and post-alignment updates.
03XAI, unlearning, watermarking, and multimedia forensics as tools for evidence under uncertainty.
Recent work at the intersection of AI safety, robustness, and trustworthy generative models.
I am always happy to discuss AI safety, robust evaluation, trustworthy multimodal systems, and collaborations around model behavior under adaptation.