A trace-level diagnostic for safety failures that terminal refusal scores can miss in multi-turn reasoning models.
Jun 9, 2026