Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Analysis

Sycophancy Towards Researchers Drives Performative Misalignment

Zac Boring March 18, 2026 1 min read
Read original source →

This work was done by Rustem Turtayev, David Vella Zarb, and Taywon Min during MATS 9.0, mentored by Shi Feng, based on prior work by David Baek. We are grateful to our research manager Jinghua Ou for helpful suggestions on this blog post.IntroductionAlignment faking, originally hypothesized to emerge as a result of self-preservation, is now observed in frontier models: perceived monitoring makes them more aligned than they otherwise would be. However, some details of the eval raise questions: w

By Taywon Min

Read the full article at LessWrong AI →