Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Analysis

New ARENA material: 8 exercise sets on alignment science & interpretability

Zac Boring February 27, 2026 1 min read
Read original source →

TLDRThis is a post announcing a lot of new ARENA material I've been working on for a while, which is now available for study here (currently on the alignment-science branch, but planned to be merged into main this Sunday).There's a set of exercises (each one contains about 1-2 days of material) on the following topics:Linear Probes (replication of the "Geometry of Truth" paper, plus Apollo's "Probing for Deception" work)Activation Oracles (based around this demo notebook, with additional exercis

Read the full article at LessWrong AI →