Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Analysis

Measuring and improving coding audit realism with deployment resources

Zac Boring March 24, 2026 1 min read
Read original source →

TL;DR We study realism win rate, a metric for measuring how distinguishable Petri audit transcripts are from real deployment interactions. We use it to evaluate the effect of giving the auditor real deployment resources (system prompts, tool definitions, and codebases). Providing these resources to the auditor increases the average realism win rate from 4.6% to 32.8% for reward hacking audits, and doesn’t significantly change hack rate.💻CodeResearch done as part of the Anthropic Fellows Program.

By Connor Kissane

Read the full article at LessWrong AI →