
A trace-level diagnostic for safety failures that terminal refusal scores can miss in multi-turn reasoning models.
Jun 9, 2026

A talk on how safety can silently degrade when models are adapted after deployment, even without malicious intent.
Feb 9, 2026