Research

“Act-based approval-directed agents”, for IDA skeptics

Zac Boring March 18, 2026 1 min read

Summary / tl;drIn the 2010s, Paul Christiano built an extensive body of work on AI alignment—see the “Iterated Amplification” series for a curated overview as of 2018.One foundation of this program was an intuition that it should be possible to build “act-based approval-directed agents” (“approval-directed agents” for short). These AGIs, for example, would not lie to their human supervisors, because their human supervisors wouldn’t want them to lie, and these AGIs would only do things that their

By Steven Byrnes

Read the full article at Alignment Forum →