Tracking AI existential risk. Auto-aggregated headlines. Human-curated analysis.
AGGREGATING 47 SOURCES · UPDATED LIVE
Analysis

Against Corrigibility

Zac Boring June 7, 2026 1 min read
Read original source →

A “corrigible” agent, per the LW wiki, is:…one that doesn’t interfere with what we would intuitively see as attempts to ’correct’ the agent, or ’correct’ our mistakes in building it; and permits these ’corrections’ despite the apparent instrumentally convergent reasoning saying otherwise.Most talk about corrigibility (henceforth without scarequotes) has focused on the fact that it seems diffic

By peralice

Read the full article at LessWrong AI →