Research

Predicting When RL Training Breaks Chain-of-Thought Monitorability

Zac Boring April 1, 2026 1 min read

Read our full paper about this topic by Max Kaufmann, David Lindner, Roland S. Zimmermann, and Rohin Shah.Overseeing AI agents by reading their intermediate reasoning “scratchpad” is a promising tool for AI safety. This approach, known as Chain-of-Thought (CoT) monitoring, allows us to check what a model is thinking before it acts, often helping us catch concerning behaviors like reward hacking and scheming.However, CoT monitoring can fa

By David Lindner

Read the full article at Alignment Forum →