Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Research

Test your best methods on our hard CoT interp tasks

Zac Boring March 26, 2026 1 min read
Read original source →

Authors: Daria Ivanova, Riya Tyagi, Arthur Conmy, Neel NandaDaria and Riya are co-first authors. This work was done during Neel Nanda’s MATS 9.0. Claude helped write code and suggest edits for this post.TL;DR One of our best safety techniques right now is “just read the chain of thought”.But this isn’t always enough: can we learn more by going beyond just reading the reasoning?Yet it's such an effective technique that it's hard to tell if we have made much progress on improving methods.To help t

By daria

Read the full article at Alignment Forum →