Research

Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)

Zac Boring May 11, 2026 1 min read

1.1 Tl;drAlignment is often conceptualized as AIs helping humans achieve their goals: AIs that increase people’s agency and empowerment; AIs that are helpful, corrigible, and/or obedient; AIs that avoid manipulating people. But that last one—manipulation—points to a challenge for all these desiderata: a human’s goals are themselves under-determined and manipulable, and it’s awfully hard to pin down a principled distinction between changing people’s goals in a good way (“providing counsel”, “prov

By Steven Byrnes

Read the full article at Alignment Forum →