Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Analysis

Classifier Context Rot: Monitor Performance Degrades with Context Length

Zac Boring May 18, 2026 1 min read
Read original source →

Monitoring coding agents for dangerous behavior using language models requires classifying transcripts that often exceed 500K tokens, but prior agent monitoring benchmarks rarely contain transcripts longer than 100K tokens.We show that when used as classifiers, current frontier models fail to notice dangerous actions more often in longer transcripts. In particular, on MonitorBench, Opus 4.6, GPT 5.4, and Gemini 3.1 miss these actions 2x to 30x more often when we prepend 800K tokens of benign act

By Fabien Roger

Read the full article at LessWrong AI →